Data Quality Assurance in High-Criticality Digital Twins

Analysis of Cognite Data Fusion

Created By: Harald Blikø, Digitalisation Specialist - CMTr.io

With support from Google Gemini 2.5 Preview Deep Reseach

INTRODUCTIOn

Executive Summary

This report provides an exhaustive technical analysis of the Cognite Data Fusion (CDF) platform's capabilities concerning data quality assurance for high-criticality industrial digital twins. The assessment is benchmarked against the rigorous frameworks defined in DNV-RP-A204, "Assurance of digital twins," and the foundational data quality representation model of the OPC UA standard. The analysis specifically addresses the needs of organisations deploying digital twin applications with high DNV Criticality Levels (CL2/CL3), where the trustworthiness of data is paramount to safe, reliable, and financially sound operations.

The core finding of this analysis is that while Cognite Data Fusion does not offer a pre-packaged, out-of-the-box solution that directly maps to the DNV-defined "Quality Indicator" (QI), it provides a comprehensive and powerful toolkit of integrated services that collectively enable the development of a fully compliant and highly sophisticated data quality assurance framework. The platform's open architecture, native support for OPC UA StatusCode ingestion, and the extensibility afforded by serverless functions and specialised data science libraries are its principal strengths in this domain.

Key strengths identified include the recent native support for ingesting and visualising OPC UA StatusCodes, which preserves critical quality information from the source.1 Furthermore, the Data Sets feature provides a robust foundation for the data governance, integrity, and lineage tracking mandated by DNV.³ The platform's extensibility, through Cognite Functions and the Industrial Data Science Library (InDSL), offers the necessary power to implement custom, real-time quality monitoring and cleansing logic.⁵

The primary gaps identified are the absence of a unified, first-class "Quality Indicator" entity that natively integrates automated monitoring results with periodic manual assessments, and the lack of a structured, built-in workflow for capturing and managing these manual quality checks.

The primary recommendation of this report is that clients with high-criticality applications can achieve full DNV-RP-A204 compliance by architecting a custom Quality Indicator solution within the CDF ecosystem. This report provides a detailed architectural blueprint for such an implementation, leveraging Cognite Functions for continuous automated monitoring, a custom data model for periodic manual inputs, and CDF Transformations to aggregate these inputs into a final, user-facing QI score. By following this recommended architecture, organisations can effectively close the identified gaps and deploy digital twins that meet the highest standards of operational trust and assurance.

CHAPTER I

The Assurance Framework for High-Criticality Digital Twins

I.II - The Imperative for Assurance: Criticality and Confidence Levels
The deployment of digital twins in capital-intensive industries has moved beyond visualisation and basic monitoring to become integral to high-consequence decision-making. As these virtual representations inform decisions affecting safety, environmental integrity, and financial performance, the need for a structured assurance framework becomes paramount. The DNV-RP-A204 "Assurance of digital twins" standard provides such a framework, establishing a systematic process for developing and assuring trustworthy digital twin outputs.⁴

At the core of the DNV methodology is a risk-based approach that links the potential impact of a decision to the level of assurance required for the digital twin component supporting it. This component, termed a "Functional Element" (FE), is a discrete module of a digital twin designed to support a specific "key decision".⁴ The standard defines a "Criticality" level for each key decision, which is determined by evaluating the potential consequence of a wrong decision and the likelihood of making it. This Criticality, in turn, dictates the required "Confidence Level" (CL) for the FE, with three levels defined⁴:

Confidence Level 1 (CL1): For decisions with low consequences.
Confidence Level 2 (CL2): For decisions with moderate consequences, where the FE is one of several sources of information and decision-making is not time-constrained.
Confidence Level 3 (CL3): For decisions with high potential consequences that could cause major failures, accidents, or environmental impact, or where the FE is the primary source of information or the decision is time-constrained.

For applications with high criticality, corresponding to CL2 and particularly CL3, the requirements for data assurance become significantly more stringent. A simple validation that data is present or syntactically correct is insufficient. DNV-RP-A204 mandates a continuous, evidence-based assurance process that demonstrates the data is trustworthy, accurate, and representative of the physical asset throughout its entire lifecycle.⁴ This requirement for continuous assurance is the primary driver for the technical capabilities assessed in this report.

I.II - The Quality Indicator (QI): A Cornerstone of Operational Trust

A central concept within the DNV-RP-A204 framework is the "Quality Indicator" (QI). The QI is defined as a diagnostic indicator that reports the trustworthiness of the results provided by a Functional Element.4 It is not merely a backend data quality score; it is a crucial, user-facing tool designed to be presented alongside the FE's output, typically in a user interface or dashboard. Its purpose is to provide the end-user—the operator or engineer making the key decision—with immediate and unambiguous insight into the reliability of the information they are consuming.⁴

The standard recommends a "traffic light" visualisation (Green, Yellow, Red) where the criteria for each state are clearly defined, allowing a user to understand not only the current quality status but also the underlying reasons for any degradation.⁴ This concept is built upon two distinct but integrated assessment processes: a continuous automated assessment and a periodic manual assessment.⁴

The combination of these two assessment types reveals a foundational principle of the DNV framework. The Quality Indicator is not purely a technical data validation metric; it is a socio-technical construct designed to build and maintain human trust in a complex digital system. The automated component provides constant, real-time vigilance over the data streams and computational models, ensuring the system's internal health. The manual component provides the essential layer of governance, accountability, and verification against physical reality. It assures the user that the digital twin is not an unmanaged "black box" but is actively maintained, calibrated, and kept in sync with the asset it represents. This principle dictates that any platform aiming for DNV compliance must provide robust mechanisms not only for automated data processing but also for formalising and integrating these critical human-in-the-loop workflows.

I.II.I - Continuous Automated Assessment
The continuous assessment component of the QI is an automated process, typically executed by an algorithm, that continuously monitors factors that can degrade the trustworthiness of an FE's output in real time.⁴ According to DNV-RP-A204, requirements 5.2.2-1 and 5.2.2-8, this automated monitoring must address a reduction in quality across several domains:

Input Data Quality: This involves monitoring incoming data streams for a wide range of issues, including missing data, timeliness violations (latency), syntactic errors, and semantic inconsistencies such as values falling outside of expected physical ranges. The system must be able to detect these issues as they occur.⁴
Computation Model Quality: Many FEs rely on computation models (from simple physics-based equations to complex machine learning models) to transform input data into decision-support information. The continuous assessment must monitor the quality of these models, such as detecting higher uncertainty in their output, instability, or performance degradation.⁴
Digital Twin Infrastructure Health: The assessment must also cover the health of the underlying infrastructure. This includes detecting fault states in sensor systems, communication networks, or the digital twin platform itself that could compromise the integrity of the data or the FE's results.⁴

I.II.II Periodic Manual Assessment
The periodic assessment component addresses critical factors that cannot be reliably or completely automated. It constitutes a formal, scheduled process of manual or semi-manual checks to ensure the digital twin remains a faithful representation of its physical counterpart over time.⁴ DNV-RP-A204 requirements 5.2.2-10 and 5.2.2-11 specify that this process should include, but is not limited to, the assessment of:

Physical and Digital Asset Modifications: Any change, planned or unplanned, to the physical asset (e.g., equipment replacement, process modification) or the digital asset (e.g., software updates) must be assessed for its impact on the FE. The periodic assessment verifies that these changes are correctly reflected and that the digital representation remains valid.⁴
Data Quality Not Continuously Checkable: This category includes crucial maintenance activities that directly impact semantic data quality. A primary example is sensor calibration, which cannot be verified automatically. Other examples include changes to master data systems (e.g., updating equipment tags in an ERP system) that provide essential context to raw data streams.⁴
Physical Inspections: Data from physical inspections of the asset can provide new information or detect failure modes (e.g., corrosion) that are not captured by online sensors. The results of these inspections must be incorporated into the quality assessment.⁴
Computation Model Performance: The performance of computation models must be periodically re-validated against real-world data to ensure they continue to represent reality accurately and have not drifted over time.⁴

The results of these periodic assessments must be documented and used to update the state of the Quality Indicator, ensuring that the user-facing trustworthiness score reflects both the real-time data health and the long-term governance status.⁴

I.III - Foundational Requirements: Data Profiling, Cleansing, and Governance

Underpinning the Quality Indicator are foundational data management capabilities that DNV-RP-A204, Section 10, identifies as prerequisites for building a trustworthy digital twin system.⁴

Data Quality Profiling: The standard requires the capability to perform data profiling, which is the process of analyzing datasets to understand their statistical characteristics, structure, and overall quality. This is a proactive measure to identify potential data quality issues before the data is consumed by critical applications. The goal is to understand the syntactic, semantic, and pragmatic quality of a dataset to determine its fitness for a specific purpose.4
Data Cleansing: For FEs with high confidence levels, the system must possess robust and automated mechanisms for data cleansing. This is the process of detecting and correcting or removing defects and errors from data streams in real time. The objective is to improve the data quality to the required level of "pragmatic quality"—the degree to which the data is suitable and useful for its intended purpose.4 This is not an optional post-processing step but a required capability for robust, operational FEs.⁴
Data Governance: A formal data governance framework is required to exercise authority and control over data assets. This includes defining clear policies, processes, and roles (such as data owners and stewards) to manage data quality throughout its lifecycle. Key aspects include ensuring accountability for data quality, transparency in how quality is managed and documented, and robust processes for controlling changes to data and applications.⁴

CHAPTER II

Communicating Data Reliability at the Source: The OPC UA Standard

II.I - The Language of Quality: Understanding the OPC UA StatusCode

Before data can be profiled, cleansed, or monitored within a digital twin platform, its initial quality must be communicated from the source system. For a vast range of industrial assets, the OPC Unified Architecture (UA) standard provides this "first mile" of data quality information. The primary mechanism for this communication is the StatusCode attribute, a 32-bit unsigned integer that is transmitted with every data point or value update.⁴ A comprehensive understanding of the StatusCode is essential, as its preservation during data ingestion is a fundamental prerequisite for any subsequent quality assessment.

The structure of the StatusCode, as defined in OPC 10000-4, Section 7.39, is designed to convey both a general quality level and specific contextual information⁴:

Severity (Bits 30-31)
This two-bit field provides the primary, high-level indicator of data reliability. It is the most critical part of the StatusCode and has three primary states:

Good (00): Indicates the operation was successful, and the associated value is considered valid and usable.⁴
Uncertain (01): Indicates the operation was only partially successful. The associated value might not be suitable for all purposes and should be used with caution.⁴
Bad (10): Indicates that the operation failed. Any associated value is invalid and cannot be used.⁴

SubCode (Bits 16-27)
his field provides a specific numeric code that gives the reason for the severity level. Each code has a symbolic name (e.g., Bad_SensorFailure) that provides crucial context for diagnosing issues.⁴

Informational Bits (Bottom 16 bits)
These flags provide additional qualifying information without changing the primary meaning of the StatusCode. For industrial data, two key fields are:

LimitBits: These bits indicate if a sensor value is at its high or low operational limit, or if the value is constant. This is vital for distinguishing between a normal process value and a sensor that is "flat-lining" at its maximum or minimum reading.⁴
Overflow: This bit is set if a data point was lost due to an overflow in a server-side data queue (a MonitoredItem queue). This provides a direct indication of data loss between the source and the client/extractor.⁴

II.II - Contextualising Quality for Industrial Data

The power of the OPC UA StatusCode lies in its ability to provide specific, operationally relevant context. A generic "Bad" status is insufficient for an operator or engineer to take appropriate action. The SubCode provides the necessary detail to differentiate between various failure modes. The OPC 10000-8 (Data Access) specification defines a set of StatusCodes that are particularly significant for industrial data and digital twins.⁴

The ability of a digital twin platform to ingest, store, and process these specific SubCodes with full fidelity is a critical differentiator. Losing this detail at the ingestion boundary means losing irreplaceable context about the state of the physical asset and its data acquisition systems. An operator needs to know why a value is bad: a Bad_SensorFailure implies a maintenance work order is needed, whereas a Bad_NotConnected suggests a network or communication issue, and a Bad_OutOfService indicates a planned maintenance activity where the data is expected to be invalid. Each SubCode drives a different operational response, making its preservation essential for a trustworthy and actionable digital twin.

The following table translates key StatusCodes defined in OPC 10000-8 into their tangible industrial meanings, highlighting the importance of preserving this granular information.

II.III - The Challenge of Quality Propagation in Aggregates

The complexity of data quality management increases significantly when moving from raw, point-in-time measurements to calculated or aggregated values, such as an hourly average, a daily total, or a key performance indicator (KPI). The quality of such an aggregate is intrinsically dependent on the quality of the raw data points used in its calculation.

The OPC UA standard addresses this in Part 13 (Aggregates), which defines how historical data can be processed.⁴ It introduces concepts that govern how StatusCodes should be propagated through calculations. For example, the TreatUncertainAsBad parameter allows a client to specify whether values with Uncertain quality should be treated as Bad (and thus excluded from calculations) or as Good.⁴ Similarly, the PercentGood parameter defines the minimum percentage of Good raw data required within a time interval for the resulting aggregate to be considered Good.⁴

These concepts highlight a critical requirement for any digital twin platform: it must provide a defined, transparent, and configurable mechanism for propagating quality information through its calculation and transformation engines. Simply calculating a value without also calculating its corresponding quality status results in a potentially misleading output. A trustworthy system must ensure that a KPI calculated from predominantly Bad or Uncertain data is itself flagged as Bad or Uncertain.

CHAPTER III

In-Depth Analysis of Cognite Data Fusion's Data Quality Capabilities

This section provides a detailed assessment of the native features and services within the Cognite Data Fusion (CDF) platform that are relevant to implementing a DNV-compliant data quality assurance framework. The analysis covers the entire data lifecycle within CDF, from ingestion and processing to governance and visualisation.

III.I - Data Ingestion: Preserving Source Quality with the OPC UA Extractor

The fidelity of data quality information at the point of ingestion is the bedrock upon which any subsequent assurance framework is built. If the detailed quality context from the source system is lost or degraded as it enters the platform, any downstream analysis is fundamentally compromised. It would be performing checks on data that has already been stripped of its essential reliability context. This makes the capabilities of the data ingestion service a critical control point in the overall assurance strategy.

A significant recent enhancement to the CDF platform is the introduction of native support for ingesting and storing the full OPC UA StatusCode with each time series data point.¹ This is a foundational capability that makes a DNV-compliant architecture possible. Without this feature, the platform would be unable to distinguish between different failure modes at the source, rendering a detailed quality assessment impossible.

The Cognite OPC UA extractor provides specific configuration parameters to control this behaviour.⁸ The status-codes block in the configuration file is of paramount importance:

ingest-status-codes: This boolean flag must be set to true to enable the feature.
status-codes-to-ingest: This parameter determines which data points are ingested based on the severity of their StatusCode. The options are GoodOnly, Uncertain, or All.

For any digital twin application with a DNV Criticality Level of CL2 or CL3, it is imperative that the status-codes-to-ingest parameter is configured to All.8 This ensures that data points with Bad and Uncertain quality are not discarded at the ingestion boundary but are streamed into CDF with their full StatusCode intact. This preserves the vital diagnostic information (e.g., Bad_SensorFailure, Uncertain_LastUsableValue) needed for the continuous automated assessment component of the Quality Indicator. Incorrect configuration at this stage would irreversibly discard critical operational context and invalidate the entire downstream quality framework.

III.II - Automated Monitoring and Profiling Capabilities

CDF provides a suite of native tools that align with the DNV requirement for continuous, automated data assessment and proactive data profiling.

Data Profiling: The primary tool for proactive data analysis is the RAW Explorer, which operates on data landed in the CDF staging area (CDF RAW).9 It can generate standard data profiling reports for tabular data, allowing data engineers and domain experts to discover patterns, outliers, and other statistical characteristics to gain in-depth knowledge about the data quality before it is transformed and loaded into the main CDF data model.⁹ This capability directly addresses the DNV requirement for initial data quality assessment early in the development phase. Additionally, Cognite Charts, a visualisation tool, includes an on-the-fly "Data profiling" feature that can assess the quality of a time series, for example by analysing the distribution of time deltas between data points to identify gaps.¹⁰
Data Monitoring: CDF includes a dedicated data quality monitoring service. This service allows users to define rules and continuously monitor time series data for reliability.¹¹ Crucially, the service creates live, reusable data quality metrics, which are themselves stored as time series within CDF.¹¹ This architectural choice is significant, as it makes the data quality status directly accessible via the standard time series API, allowing it to be easily integrated into dashboards, applications, and further calculations. The service provides a user interface to troubleshoot alerts, showing which time series are violating which rules and allowing users to explore the historical data.11 This forms a core building block for the continuous automated assessment part of a DNV-compliant QI.
Visualisation of Quality: The native visualisation tool, Cognite Charts, is capable of rendering the quality of time series data directly on a plot. It leverages the ingested OPC UA StatusCodes to automatically shade the chart line for periods of Uncertain data and create visible gaps for periods of Bad data.² This provides an immediate, intuitive visual cue to end-users about the reliability of the data they are analysing, which is a key principle of the DNV Quality Indicator.

III.III - Implementing Custom Logic: Data Cleansing and Validation

While CDF provides built-in monitoring, complex industrial scenarios often require custom business logic for data cleansing, validation, and the calculation of nuanced quality metrics. CDF offers a powerful and flexible toolkit for implementing this custom logic.

CDF Transformations
This service is the primary mechanism for performing large-scale, scheduled data processing and cleansing. It is built on Apache Spark and allows users to define transformation logic using standard Spark SQL.¹² Data engineers can write complex queries to read data from the CDF staging area (RAW), apply validation rules (e.g., check for required information, filter out invalid entries), perform cleansing operations, and shape the data before writing it to the target CDF data model.13 This tool is ideally suited for implementing the "data cleansing" capabilities required by DNV.⁴

Industrial Data Science Library (InDSL)
Cognite provides a specialised Python library, InDSL, containing a collection of algorithms and models tailored for industrial data.¹⁴ A key module within this library is dedicated to Data Quality, offering a rich set of pre-built functions to assess time series data along multiple dimensions.⁶

These functions cover:

Completeness: Functions like completeness_score() to calculate the ratio of actual to expected data points, and various methods (gaps_identification_z_scores, gaps_identification_iqr) to detect gaps in the data.⁶
Validity: Functions like out_of_range() to detect extreme outliers and value_decrease_check() to identify illogical decreases in cumulative values (e.g., running hours).⁶ This library provides the advanced analytical capabilities needed to build a sophisticated automated assessment engine.

Cognite Functions:
This serverless compute environment allows users to deploy custom Python code, including functions from the InDSL, that can be scheduled to run at regular intervals or triggered by events (such as the arrival of new data).⁵ Cognite Functions are the key architectural component for implementing a real-time, continuous quality assessment engine. A function can be designed to execute every minute, retrieve the latest data points for a set of critical time series, run a series of InDSL quality checks, and write the resulting quality score back to CDF, providing the live metrics required for a DNV-compliant QI.

III.IV - Ensuring Governance and Traceability: Data Sets and Lineage
DNV-RP-A204 places a strong emphasis on data governance, including the ability to trace data lineage and ensure data integrity.⁴ The Data Sets feature in CDF is the primary tool for addressing these requirements.

Data Lineage: A Data Set is a container for data objects (such as assets, time series, and events) that groups them by their source or data ingestion pipeline.³ Since a data object can belong to only one data set, this provides an unambiguous way to trace the lineage of any piece of data back to its origin.³ The Data Sets user interface includes a "Lineage" tab where data managers can document the data's origin and processing steps, providing the transparency required for audits and governance.¹⁸
Data Integrity and Governance: Data Sets provide critical governance controls. They can be write-protected, which prevents any changes to the data objects they contain, except by users or services that have been explicitly granted "owner" permissions for that specific data set.³ Furthermore, a data set can be assigned a governance status of "Governed" or "Ungoverned".¹⁸ This allows an organization to formally designate which datasets have been vetted, have a clear owner, and follow established governance processes. For high-criticality digital twin applications, all input data should reside in "Governed," write-protected data sets to ensure the integrity of the data foundation.18 These features directly map to the DNV requirements for data governance and managing changes to ensure data remains trustworthy.⁴

CHAPTER IV

Gap Analysis:
Cognite Data Fusion vs. the DNV-RP-A204 Quality Indicator

While Cognite Data Fusion provides a robust set of underlying tools and services for data quality management, a direct comparison against the specific requirements of the DNV-RP-A204 Quality Indicator reveals several gaps. These gaps do not represent fundamental platform deficiencies but rather the absence of a pre-packaged, out-of-the-box solution that fully aligns with the DNV model. Addressing these gaps requires deliberate architectural design and implementation effort by the client.

IV.I - The Missing Holistic QI Entity

The most significant gap is the absence of a native, first-class "Quality Indicator" resource type within the CDF data model. The DNV standard envisions the QI as a unified entity that aggregates the results of both continuous automated checks and periodic manual assessments into a single, easily consumable score (e.g., Green/Yellow/Red).⁴

CDF provides the necessary building blocks to create this entity: live quality metrics can be stored as time series, manual assessment results can be stored in a custom data model, and aggregation logic can be implemented in Transformations.¹¹ However, the platform does not offer a pre-configured object or service that automatically performs this unification.

The impact on a client with high-criticality applications is that they bear the responsibility for architecting and building this unifying logic. This requires a deeper level of platform expertise and a greater development effort compared to a solution where a QI entity is a standard, configurable feature. The client must design the data models, write the aggregation code, and ensure the end-to-end workflow is robust and reliable.

IV.II - The Manual Assessment Workflow Gap

A direct consequence of the missing holistic QI entity is the lack of a native user interface or dedicated workflow for capturing the results of periodic manual assessments. As mandated by DNV-RP-A204, activities such as verifying sensor calibration, logging the impact of physical asset modifications, or confirming master data updates are essential for maintaining a trustworthy digital twin.⁴

CDF does not provide a standard, built-in tool for an integrity engineer or asset operator to perform these tasks and log the results in a structured manner that can be programmatically integrated into a quality score.²⁰ There is no out-of-the-box form or interface for an engineer to attest that a sensor's calibration was checked and passed on a specific date.

The impact on the client is the need to create a mechanism for ingesting this crucial manual data. The solution could range from a simple, disciplined process involving CSV file uploads to a more sophisticated custom application. Such an application could be built using tools like Microsoft Power Apps or the open-source Streamlit framework, which would then use the CDF API to write the assessment results into a custom-designed data model within CDF. This represents a solution component that must be designed, built, and maintained by the client.

IV.III - The Quality Propagation Gap

While CDF's OPC UA extractor now supports the high-fidelity ingestion of raw StatusCodes, there is no automatic, built-in mechanism to propagate these quality flags through complex, multi-stage calculations within CDF Transformations or Cognite Functions. The platform provides the tools to perform calculations on data values but does not enforce a corresponding calculation on the data quality.

For example, if a CDF Transformation calculates an hourly average production rate from ten underlying flow meter time series, and during that hour three of the meters reported Uncertain quality and one reported Bad quality, the platform does not automatically determine the final StatusCode for the calculated average. The developer is entirely responsible for implementing the business logic to handle this quality propagation (e.g., "if more than 20% of input data is Bad, then the output is Bad").

The impact of this gap is significant for high-criticality applications. It places a substantial burden on data engineers and solution developers to design and rigorously validate their transformation logic not only for the numerical correctness of the data values but also for the logical propagation of the accompanying quality information. Without disciplined implementation and strong development governance, it is easy for quality context to be inadvertently lost during processing. This could lead to a situation where a critical KPI appears to have Good quality in a dashboard, while in reality, it was derived from unreliable or invalid underlying data, fundamentally undermining the trustworthiness of the digital twin.

The following table provides a summary of the gap analysis, directly mapping key DNV-RP-A204 requirements to the native capabilities of the Cognite Data Fusion platform.

CHAPTER V

Recommendations: Architecting a DNV-Compliant Quality Framework in CDF

Despite the identified gaps, Cognite Data Fusion's open, flexible, and powerful architecture provides all the necessary components to build a comprehensive data quality assurance framework that fully complies with the DNV-RP-A204 standard. This section provides a prescriptive architectural blueprint for implementing a custom Quality Indicator and outlines best practices for governance to ensure the solution is robust, scalable, and trustworthy for high-criticality applications.

V.I - Implementation Blueprint for a Custom Quality Indicator

The recommended approach is to construct a modular solution within CDF, where each module addresses a specific component of the DNV Quality Indicator requirement. This architecture leverages a combination of serverless functions, custom data modeling, and data transformations to create a seamless and automated end-to-end workflow.

V.I.I - Module 1: The Continuous Assessment Engine
This module is the heart of the automated QI process, providing real-time monitoring of data streams.

Architecture
The engine should be implemented using Cognite Functions, the platform's serverless compute environment.5 A Python function should be developed and scheduled to run at a high frequency appropriate for the specific use case (e.g., every minute for real-time monitoring). This function will leverage the Cognite Industrial Data Science Library (InDSL) for its advanced data quality algorithms and the Cognite Python SDK for interacting with CDF data objects.⁶

Process Flow

Query Data: The scheduled function is triggered and uses the Cognite Python SDK to query the latest data points for a predefined set of critical input time series.
Inspect Source Quality: The function first inspects the ingested OPC UA StatusCodes associated with each data point. Any data point with a Bad or Uncertain status is immediately flagged.
Apply Custom Rules: The function then applies a series of more sophisticated data quality checks using the InDSL. This can include completeness_score() to check for missing data, gaps_identification_threshold() to detect communication dropouts, and out_of_range() to identify outliers or physically impossible values.⁶
Calculate Score: Based on a defined set of business rules, the function calculates a numerical quality score for the time period (e.g., 1 for Good, 2 for Uncertain, 3 for Bad).
Persist Result: The function writes this calculated score as a new data point to a dedicated time series in CDF, for example, asset_XYZ:pressure:qi_continuous_score. This new time series serves as the live, auditable record of the automated quality assessment.

V.I.II - Module 2: The Periodic Assessment Module
This module addresses the DNV requirement for incorporating manual, periodic checks into the quality framework.

Architecture
A custom Data Model must be created in CDF to serve as a structured repository for manual assessment results.19 This data model should be designed to capture all relevant information as required by DNV-RP-A204. A potential schema could include fields such as assetExternalId, assessmentType (e.g., "Sensor Calibration", "Physical Inspection"), assessmentTimestamp, assessorName, assessmentResult ("Pass", "Fail", "With Remarks"), and a comments field.

Process Flow

Manual Data Entry: An external process is required for data entry. For maximum usability and control, a simple front-end application should be developed using a tool like Microsoft Power Apps or Streamlit. This application would provide a user-friendly form for engineers to select an asset, choose the assessment type, and log the results.
Write to CDF: Upon submission, the application uses the CDF API (or the Python SDK if using Streamlit) to write the assessment results as a new instance in the custom "Manual Assessment" data model. This creates a persistent, queryable, and auditable record of all manual governance activities.

V.I.III - Module 3: The Unified QI Aggregation and Visualisation Layer
This final module integrates the outputs of the automated and manual assessments into a single, user-facing Quality Indicator.

Architecture
The aggregation logic should be implemented using a scheduled CDF Transformation.12 Using Spark SQL provides a robust and scalable engine for joining and processing the data.

Process Flow

Read Inputs: The scheduled Transformation job executes (e.g., every five minutes). It reads the latest value from the "Continuous QI Score" time series for each relevant asset.
Join with Manual Data: It then queries the "Manual Assessment" data model to find the latest relevant assessment for each asset (e.g., the most recent "Sensor Calibration" check).
Apply Final Logic: The core of the transformation is a SQL CASE statement or similar logic that applies the final business rules to determine the unified QI state. For example: IF manual_assessment.result = 'Fail' THEN 'Red' ELSEIF continuous_qi.score = 3 THEN 'Red' ELSEIF manual_assessment.result = 'With Remarks' OR continuous_qi.score = 2 THEN 'Yellow' ELSE 'Green'. This logic ensures that a failed manual inspection immediately overrides the automated score.
Write Unified QI: The final QI state ("Green", "Yellow", or "Red") is written back to CDF. This can be stored either as a new time series (e.g., asset_XYZ:pressure:qi_unified_score) or, more effectively, as a metadata property on the asset itself. Storing it as metadata makes it easily accessible for dashboards and applications.
Visualisation: The final, unified QI property can be easily queried and displayed in any front-end application, such as Cognite Charts or a Grafana dashboard.¹⁰ Conditional formatting can be applied to display the value as a colour-coded "traffic light," providing the intuitive, at-a-glance status required by DNV.

The following table summarises the recommended architectural pattern, linking each required function to a specific Cognite Data Fusion tool.

V.II - Configuration and Governance Best Practices
To ensure the long-term trustworthiness and maintainability of the data quality framework, the implementation of the technical architecture must be supported by strong governance and configuration management practices.

Enforce Write-Protection and Governance Status: All data pipelines that feed high-criticality digital twins must originate from and write to Data Sets in CDF that are formally marked as "Governed" and are write-protected.³ This is a critical control that prevents unauthorised or accidental changes to both the raw data and the transformation logic that underpins the quality assessments. Access should be strictly controlled via CDF's access management capabilities.
Mandate Quality Propagation in Transformations: A formal development standard must be established and enforced for all Transformations that process data for critical FEs. This standard must require the explicit handling and propagation of StatusCodes and other quality metrics. The output of any transformation must include a calculated StatusCode that accurately reflects the quality of its inputs. Code reviews should validate that this quality logic is correctly implemented to prevent the loss of quality context.
Leverage Data Lineage for Audits and Management of Change: The lineage information inherently captured by the Data Sets feature should be used as a primary tool during audits and for Management of Change+ (MOC+) processes as defined by DNV.³ This allows organisations to demonstrate to regulators or internal assurance teams that data is traceable, governed, and that quality is being actively managed from source to consumption. When a change is proposed to a data pipeline, the lineage view can be used to perform an impact assessment on all downstream FEs and QIs.

Chapter

Conclusion

This analysis concludes that Cognite Data Fusion is a highly capable and flexible platform for industrial data operations. For clients deploying digital twin applications in high-consequence environments governed by rigorous standards like DNV-RP-A204, achieving full compliance is not an out-of-the-box feature but a deliberate and achievable architectural exercise.

The platform's primary strength is its open and extensible nature. It provides a complete and powerful set of tools—from the high-fidelity ingestion of source quality information via its OPC UA extractor to the serverless compute of Cognite Functions, the advanced analytics of the Industrial Data Science Library, and the robust governance of Data Sets. These components collectively serve as the building blocks for a sophisticated, custom-built data quality assurance framework.

While gaps exist in the form of a native, unified Quality Indicator entity and a built-in workflow for manual assessments, these can be effectively closed by following the architectural blueprint and governance best practices outlined in this report. By architecting a custom solution that integrates automated and manual assessments, organisations can leverage the full power of the Cognite Data Fusion platform to deploy digital twins that not only drive operational value but also meet the highest standards of operational trust, reliability, and assurance.

Chapter

Conclusion

Release notes - Cognite Documentation, accessed on August 3, 2025,
What's new? - Cognite Documentation, accessed on August 3, 2025,
Data sets | Cognite Documentation, accessed on August 3, 2025,
OPC 10000-8 - UA Specification Part 8 - DataAccess 1.05.03.pdf
Use Functions | Cognite Documentation, accessed on August 3, 2025,
Data Quality — indsl 8.7.0 documentation, accessed on August 3, 2025,
UA Part 4: Services - 7.34 StatusCode, accessed on August 3, 2025,
Configuration settings | Cognite Documentation, accessed on August 3, 2025,
CDF staging area (RAW) - Cognite Documentation, accessed on August 3, 2025,
Charts - Cognite Documentation, accessed on August 3, 2025,
Introducing live reusable data quality metrics | Cognite Hub, accessed on August 3, 2025
Transform data | Cognite Documentation, accessed on August 3, 2025
About data transformation | Cognite Documentation, accessed on August 3, 2025
About the Data Science Toolbox Functions - Cognite Hub, accessed on August 3, 2025,
Cognite's Industrial Data Science Library — indsl 8.7.0 documentation, accessed on August 3, 2025,
The CDF data model and resource types - Cognite Documentation, accessed on August 3, 2025,
Create data sets and add data - Cognite Documentation, accessed on August 3, 2025,
Edit and explore data sets - Cognite Documentation, accessed on August 3, 2025
DataModel | Cognite, accessed on August 3, 2025
Cognite Documentation, accessed on August 3, 2025
Tasks in data workflows - Cognite Documentation, accessed on August 3, 2025
Grafana | Cognite Documentation, accessed on August 3, 2025