In this blog, we will discuss how to implement lineage, insights (reporting), and monitoring capabilities in Microsoft Purview. First, we will understand what Lineage is and why it is important. Then, we will learn Purview’s insights capabilities and how purview provides the unique capabilities of reporting for Assets, Scans, Glossary, classification, and Sensitivity Labels.
Finally, We will learn why it is important to monitor the purview environment and how to monitor it based on best practices.
What is Lineage?
Lineage is the ability to show how data moves over time and enables you to see how data is used and what changes have been made. It helps you to :
- Better understand the data.
- Traceback and correct the source data
- Enables the better data quality
Usually, Lineage is captured from the tools that Perform the ETL for example Azure Data Factory, SSIS, or Informatica. Purview supports the Custom Lineage which is created by uploading the metadata using purview Atlas Hooks
In the video, I have showcased the Lineage demo by creating an Azure Data Factory copy pipeline that copies and transforms the data into a parquet file. When you run the pipeline it will create the lineage in the Purview which will show the source of the data and what it looks like after the transformation. Here are the steps performed for this demo:
- Create a Data Factory Connection for the existing data factory in Azure Purview.
- Copy Data using Azure Data Factory pipeline.
- Verify the Lineage in Purview.
Purview Insights (Reporting)
Microsoft Purview insights provide unique reporting capabilities about Assets, Scans, Glossary, Classification, and Sensitivity Labels.
Assets Reports: It provides the summarized view of data estate and its distribution by different classifications and source types.
Glossary Insights: It shows the distribution of glossary terms by status, showing how many terms are attached to a particular asset. This helps to understand the completeness of the glossary terms. This is what a typical Glossary report looks like:
Classification Insights: This report shows where classification data is located and helps to drill down to the classification files. This helps to enable the data quality ad security. This is what the typical classification report looks like:
Sensitivity Label Insights: Sensitivity Labels help to tag the sensitive data in the enterprise, for example, SSN numbers can be a sensitive data element and can not be shared outside. This is what a Sensitivity label report looks like:
Scan Insights: Scan insights help to understand various metrics related to scanning jobs running against the asset for example how many scans ran so far and how many assets were scanned and how many assets were classified. This is what a typical scan report looks like:
The azure monitor can be used to track the operational state of Purview for example it can help us to find :
- Number of scans completed
- Number of scans canceled
This monitoring capability helps to track potential problems and then troubleshoot them. This way it helps to improve the purview reliability.
In the live demo, we will create a scan rule and send diagnostic logs to Azure storage and once we run the scan we will be able to see the logs collected in the Azure storage. This is what the logs look like:
Here are the steps used in the demo:
- Grant user access to Purview Metrics.
- Visualize Purview Metrics.
- Send Diagnostic Logs to Azure Storage
To grant a user to visualize the in-built Purview metrics we need to grant Monitoring Reader permission to Purview Metrics. Once this permission is granted user can go to the Metrics tab in Purview blade and visualize the metrics. By default, this is already enabled for the user who deployed the purview instance.