- April 25, 2024
- Posted by: Yuvasri Ravi
- Category: Snowflake
Introduction:
Snowflake delivers unparalleled features aimed at maintaining the utmost governance standards for your account, users, and the entirety of your data within the platform.
In the first of the two-part blog on Data Governance in Snowflake, we had explained about Row Level Security and column level security. In this blog will be discussing about object tagging and data classification.
Prerequisites:
- Snowflake trial account
- Knowledge in SQL Scripting
Overview:
Object Tagging:
Object Tagging enables governance operations such as data classification, tracking sensitive data, resource usage monitoring, and protecting the data. Object tagging allows us to assign metadata to Snowflake objects to categorize a group of objects or describe the type of data stored in a table or column.
Data Classification:
Classification is a multi-step process that associates Snowflake-defined system tags to columns by analysing the fields and metadata for personal data; this data can be tracked by a data engineer using SQL and Snowsight.
Use case for Object Tagging:
A tag is a schema-level object that can be assigned to another Snowflake object. Snowflake stores the tag and its string value as a key-value pair. The tag must be unique for your schema, and the tag value is always a string. We can assign multiple tags to an object.
For Example,
PII= ‘Phone’
Here, PII is the tag and Phone is a Tag value.
As tags can be applied on various levels of database objects like tables, views, or columns, it becomes easier to identify sensitive information present across the account. This helps in restricting the access to sensitive data using the Row Access Policies or using Masking Policies in conjunction with tags.
In this scenario, we are going to perform tag-based masking.
Follow the below steps to create a tag and assigning it to snowflake object
- Tag Creation
Tags in snowflake can be created using CREATE TAG with allowed values as shown below. A role with CREATE TAG privilege is required to create tags on the respective schema.
The Allowed_Values tag property enables specifying the possible string values that can be assigned to the tag when the tag is set on an object. We can specify these values when creating or replacing a tag with a Create Tag statement, or while modifying an existing tag key with an Alter Tag statement.
2. Retrieve Tag Details
To obtain the list of allowed string values for a given tag, call either the GET_DDL function or the SYSTEM$GET_TAG_ALLOWED_VALUES function.
3. Assigning tag to snowflake object
Tag can be applied to as many different Snowflake objects with the same or different string values. If the allowed values attribute is defined, the tag value must be one of the possible string values listed in the tag allowed values. Tags can be applied to all supported objects while creating the objects or the objects can be altered to apply the tag.
A role with APPLY TAG privilege is required to apply tags on Snowflake objects.
Tag Lineage:
A tag is inherited based on the Snowflake securable object hierarchy. The child objects in Snowflake’s securable objects hierarchy inherit all the tags applied to the parent object. This is called Tag Lineage.
Example: Assigning a HR_DATA tag to a schema will inherit the tag to all tables and columns under the schema.
4. Discover tags:
- Identify tags in your account:
Use the TAGS view in the Account Usage schema of the shared SNOWFLAKE database. This view can be thought of as a catalog for all tags in your Snowflake account that provides information on current and deleted tags.
- Database-level query for all the tags on every column in a table
Use the Information Schema table function TAG_REFERENCES_ALL_COLUMNS to obtain all of the tags that are set on every column in a given table or view.
Using the below comments, we can list of all tag definitions within an account, database, and schema, respectively.
Use Case: Tag-Based Masking Policies
The below example demonstrates how to mask data based on a tag value. That is, mask data based on tag assignment PII_Data= ‘Contact Details’
- Create Masking policy
2. Apply the masking policy to the tag: ‘PII_DATA’
3. Assigning the tag PII_DATA= ‘Contact Details’ to a column will mask the column data automatically
4. Viewing from unauthorized role
When we try to view the data from unauthorized role, Phone column will be masked based on the masking policy
Use case for Data classification:
- Data classification in Snowflake allows organizations to organize, manage, and secure their data effectively.
- Data classification in Snowflake involves labelling data based on its sensitivity, regulatory requirements, or other relevant factors.
- By classifying data, organizations can implement appropriate access controls, encryption, and auditing measures to protect sensitive information and comply with regulations such as GDPR, HIPAA, or PCI DSS.
Types:
- System Tags
- Semantic Category
- Private Category
- Custom Classification Tags
- System Tags:
The Procedure is used to classify a single table and set system tags on the columns in the tables. There are two Classification system tags, both of which exist in the SNOWFLAKE.CORE schema:
- SNOWFLAKE.CORE.SEMANTIC_CATEGORY
- SNOWFLAKE.CORE.PRIVACY_CATEGORY
Syntax:
CALL SYSTEM$CLASSIFY (‘hr.tables.empl_info’, {‘auto_tag’: true});
Example:
- Semantic Category :
- The semantic category identifies personal attributes such as name, address, zip code, phone number, age, and gender.
- Data classification, also known as entity recognition or PII detection, is the process of labeling data with its semantic type after inferring the meaning of the data.
Example:
To classify the column in a table,
To classify a single table, use the below code,
b. Privacy Category:
Specifies the privacy category tag value. The privacy category has three values:
- Identifier (Direct Identifiers) – These attributes uniquely identify an individual. Example: Name, Phone, and social security number.
- Quasi-identifier (Indirect Identifiers) – These attributes can uniquely identify an individual when two or more of these attributes are in combination. Example: Age, Gender.
- Sensitive – Information that the individual would rather not disclose for privacy reasons. Example: Salary.
Before Classification of the ‘Customers’ table.
After classification applied to the Customers table, the classification for Salary as ‘Sensitive’, gender as ‘Quasi_Identifier’, phone as ‘Identifier’, etc
2. Custom classification
- The custom classification algorithm uses a scoring rule to determine which semantic category system tag to recommend and which semantic category tags, if any, to suggest as alternatives.
- It provides the CUSTOM_CLASSIFIER class in the SNOWFLAKE.DATA_PRIVACY.
- Define industry- and domain-specific tags for columns containing sensitive data.
- Leverage Snowflake to have more control over your efforts to track PII data.
- Custom classification uses the CUSTOM_CLASSIFIER class. You can create an instance of the CUSTOM_CLASSIFIER class and call instance methods to manage regular expressions associated with the instance.
Methods to call the custom classifier:
- custom_classifier!ADD_REGEX
- custom_classifier!DELETE_CATEGORY
- custom_classifier!LIST
Example:
1.Creation of Custom Classifier:
2. Creation of ADD_REGEX method:
3.Assign the Created custom classifier to the column
4. Result:
In Part-1 of the Snowflake Data Governance Series, we delved into Row Level Security and Column Level Security specifics, followed by a brief examination of object tagging and data Classification in Part 2. As observed in both these blogs, Snowflake’s strategy for data governance offers organizations comprehensive tools to uphold data integrity and security across their operations effectively.
Cittabase is a select partner with Snowflake. Please feel free to contact us regarding your Snowflake solution needs. Our snowflake solutions encompass a suite of services for your data integration and migration needs. We are committed to providing personalized assistance and support customized to your requirements