Insights into Data Quality: Navigating Rule Occurrences in Informatica CDGC

Introduction:

A data quality rule occurrence is the business representation of a data quality rule that is run on one or more data elements.

In Informatica CDGC, rule occurrences refer to the instances where data quality rules are applied to datasets. They help track how often and where these rules are triggered, providing insights into data quality and governance effectiveness.

How to create a Data Quality Rule Occurrence in CDGC:

We can take any built in native catalog as per the business requirements, so here I preferred Oracle native catalog.

Steps to follow the creation of Data Quality Rule Occurrence in CDGC

Oracle Native Catalog:

1.In Metadata command center service, First we need to create the native(Built-in) catalog to process the metadata into the CDGC.

2.Provide the Name as per your wish and configure the oracle connection and call that connection in connection information. Validate the connection and it will pop up test connection successful.

3.Navigate the step 2 of Configuration tab under Metadata Extraction. Provide the up and running Runtime environment,Metadata change option should be Retain and apply the filter condition if necessary.

4.Navigate the Data Profiling and Quality tab under Data Quality should be enabled to process the data quality rule.

5.After Enabling the Data Quality, Below connection and parameter should be passed. Here the Data Quality Rule Occurrence so the Data Quality Automation should be NO.

6.Save and Run the Native Oracle Catalog in Metadata Extraction. For the first run metadata should be extracted in CDGC. After that only we need to enable the Data Profiling and Quality.

7.Job has been completed successfully in Metadata Command Center.

8.Verify the Metadata Extraction Details in Job Logs with Result.

9.Open the Data Governance and catalog service and open the Data catalog which we created.

10.Check the Overview tab under the Database, Schema, Table, Column details.

11.Once Metadata Extraction is Completed then Create the Data Quality Rule Occurrence.

12.Provide the Name as per your wish in Data Quality Rule Occurrence. Reference Id is generated automatically.

13.According to your business rule Define the Dimension and the Measuring Method should be Informatica Cloud Data Quality.

14.Select the Primary Data Element and choose the Catalog Source, Dataset and Data Element.

15.Select the Technical Rule Reference as per the business logic single column because we can’t apply the rule specification of multiple column.

16.Add the Stakeholders of the Data Quality Rule Occurrence job else it will not show the Run Now option.

17.Click the Create option to create the Data Quality Rule Occurrence.

18.After Successful Creation of rule occurrence, Validate  all the necessary details.

19.We can create the N number of Data Quality Rule Occurrence to run the jobs in one shot using the metadata native catalog source.

20.Open the Metadata Command Center Service.Open the Native oracle which we created already and run the Data Profiling and Quality.

21.Once the job is completed check the Data Quality Details in Results tab.

22.Below data are the Catalog Source data according to this data rule specification will be process.

23.Check the Data Governance and Catalog Service,Open the Data Quality rule that we created already and refresh the Score page.

24.Score tab contains the Total number of row in Source and Failed rows.

25.We can see the failed rows in Data Preview tab.

26.We can check the same in Oracle native source catalog in Data Quality tab.

27.In the same way we need to check for the other two rule occurrence in CDGC.

Benefits of Data Quality Rule Occurrence in CDGC:

The benefits of tracking rule occurrences in Informatica CDGC include:

1. Improved Data Quality: Identifies and addresses data quality issues by applying rules systematically.
                2. Enhanced Monitoring: Provides visibility into how often and where data quality rules are triggered, helping to track the effectiveness of data governance.
                3. Actionable Insights: Helps in analyzing trends and patterns in data quality, enabling better decision-making and targeted improvements.
                4. Compliance Assurance: Ensures data meets regulatory and organizational standards by continuously monitoring rule adherence.
                5. Efficient Issue Resolution: Facilitates quicker identification and resolution of recurring data quality problems.

Limitation in Data Quality Rule Occurrence in CDGC:

  1. Performance Overhead: Monitoring and applying rules can introduce performance overhead, potentially impacting data processing speeds and system resources.
  2. Complexity in Configuration: Setting up and managing multiple data quality rules can become complex, requiring careful configuration to avoid conflicts or redundant checks.
  3. Limited Scope: Rule occurrences may not capture all types of data quality issues, especially those that require more nuanced or context-specific evaluations.
  4. Data Volume Impact: Large volumes of data or frequent rule executions can lead to a high number of occurrences, making it challenging to analyze and prioritize issues effectively.
  5. False Positives/Negatives: Rules might generate false positives or miss issues if not properly defined, leading to potential inaccuracies in data quality assessments.

Conclusion:

Tracking rule occurrences in Informatica CDGC is crucial for maintaining high data quality and effective data governance. It enables organizations to monitor how data quality rules are applied, provides actionable insights into data issues, and helps ensure compliance with standards. By analyzing rule occurrences, organizations can improve their data practices, resolve issues more efficiently, and enhance overall data integrity.

Please reach out to us for your Informatica solution needs. We are an Informatica Platinum Partner with extensive experience with Informatica implementations and data integration.



Leave a Reply