- September 17, 2024
- Posted by: Shibu P
- Category: Databricks
With Unity calalog you can create calalogs, schemas, tables, view and user-defined functions providing a robust framework to utilize the data assets. We are going to control access these objects with permission by ensuring that authorized user can only view or modify sensitive information. Using dynamic views unity calalog can provide protection in row-level and column-level in tables. exploring grants and revoke on various objects with unity calalog. We are going to use EV Global sales data set.
Create a new catalog and select it as default catalog:
First, we need to create a new catalog in our metastore. Run the create statement. the USE statement to select a default schema.
In the above provided SQL code, we are creating a new catalog named sales_calalog
if it doesn’t already exist. A catalog in Unity Catalog is a top-level container that holds databases and tables. The USE CATALOG sales_calalog;
command then sets this catalog as the current working catalog, meaning any subsequent operations will apply to it.
SQL Scripts:
CREATE CATALOG IF NOT EXISTS sales_calalog;
USE CATALOG sales_calalog;
Create and use the new schema:
Next, let’s create a schema in this new catalog. Because we’re now using a unique catalog that is isolated from the rest of the metastore. also set this as the default schema.
Here, the SQL code CREATE SCHEMA VEHICAL_SALES;
command creates a new schema named VEHICAL_SALES
within the current catalog. A schema is used to logically group related tables for better organization. The subsequent USE SCHEMA VEHICAL_SALES;
command sets this schema as the active one, so any further operations will take place within this schema.
SQL Script:
CREATE SCHEMA VEHICAL_SALES;
USE SCHEMA VEHICAL_SALES;
Set up tables and views:
Under the schema let create a table name as EV_SALES inject data from azure storage account.
In SQL Editor let run the script to check if the data is ingested into table:
The hierarchical structure of data objects in Unity Catalog. Here’s the breakdown of the query components:
1. Catalog: The first part of the query, sales_calalog, represents the catalog. A catalog is a top-level container in Unity Catalog that organizes schemas and data objects. It is isolated from other calalogs, providing a separate namespace for your data.
2. Schema: The second part, vehical_sales, represents the schema within the catalog. A schema is a logical grouping of tables and views within a catalog, helping to organize and manage data objects more effectively.
3. Table: The third part, ev_sales, represents the table within the schema. A table is a data object where actual data is stored and can be queried
Grant access to data objects:
By default, unity catalog provides an explicit permission model. No permission is provided or inherited from containing elements. In order to access the data object lets us USAGE statement for permission to the user on all container elements which are catalog and schema. SELECT on the data objects such as tables and view.
SQL Script:
GRANT USAGE ON CATALOG sales_calalog TO `account users`;
GRANT USAGE ON SCHEMA VEHICAL_SALES TO `account users`;
GRANT SELECT ON TABLE EV_SALES TO `account users`;
Create and grant access to a user-defined function:
User-defined functions can be created within schema in unity catalog. Here we will create a function to mask the data except the last two character in the string.
The above SQL code is how user-defined function (UDF) in Databricks is designed to mask sensitive information. The CREATE OR REPLACE FUNCTION mask_function(x STRING)
defines a function that takes a string as input and returns a masked version of the string. It works by replacing all characters except the last two with asterisks (*
). The REPEAT("*", LENGTH(x) - 2)
replaces the characters, while RIGHT(x, 2)
keeps the last two characters visible. as shown in the SELECT output, leaving the last two characters unmasked.
SQL Script:
CREATE OR REPLACE FUNCTION mask_function(x STRING)
RETURNS STRING
RETURN CONCAT(REPEAT(“*”, LENGTH(x) – 2), RIGHT(x, 2)
);
SELECT mask_function(‘sensitive data’) AS data;
Use EXECUTE statement to allow member to account user to run the function.
SQL Script:
GRANT EXECUTE ON FUNCTION mask_function TO `account users`;
Using dynamic views to protect table column and rows:
Dynamic views provide the ability to fine-grained access control of columns and rows within table. With partially obscure we can provide column values or redact them entirely. Also, we can omit the rows based on specific criteria.
For access control we can use the functions below:
- current_user(): returns the email address of the user querying the view.
- is_account_group_member(): returns TRUE if the user querying the view is a member of the specified group.
1. Redact columns:
Let’s create a view name ev_sales_view use is_account_group_member() function.
The view checks whether the user belongs to the 'account users'
group using the is_account_group_member('account users')
function. If the user is a member of this group, the region
and parameter
fields are replaced with the string 'REDACTED'
to protect sensitive information. If the user is not in the group, the actual region
and parameter
values are displayed.
SQL Script:
CREATE OR REPLACE VIEW ev_sales_view AS
SELECT
CASE WHEN
is_account_group_member(‘account users’) THEN ‘REDACTED’
ELSE region
END AS REGION,
CASE WHEN
is_account_group_member(‘account users’) THEN ‘REDACTED’
ELSE parameter
END AS PARAMETER,
mode,
powertrain,
year,
unit,
value
FROM ev_sales;
View the data in SQL Editor:
2. Restrict rows:
If you want to restrict few rows, we can simply filter out the row based on the condition. Using WHERE clause we can apply the function is_account_group_member().
In this View, the key part is the WHERE
condition, which uses the is_account_group_member('account users')
function to apply a filter based on the user’s group membership. If the user belongs to the 'account users'
group, the view filters the data to include only rows where the year
is less than 2018. If the user is not part of this group, all rows are included, as the condition defaults to TRUE
.
SQL Script:
CREATE OR REPLACE VIEW ev_sales_view AS
SELECT
region,
parameter,
mode,
powertrain,
year,
unit,
value
FROM ev_sales
where
CASE WHEN
is_account_group_member(‘account users’) THEN year < 2018
ELSE TRUE
END;
View the data in SQL Editor:
Before:
After:
3. Data masking:
Dynamic views final use case is data masking. Masking is similar to redacted columns. Redacted replace the entire column, Masking displays some of the data rather than replacing the entirely. Here we will use mask() function to manipulate the data for the column region.
As shown in the SQL code, we are using the conditional masking based on user group membership. The critical part is the use of the is_account_group_member('account users')
function, which checks if the current user belongs to the 'account users'
group. If the user is a member of this group, the region
column is masked using the mask(region)
function to hide sensitive information. If the user is not part of the group, the original region
value is displayed without any masking.
SQL Script:
CREATE OR REPLACE VIEW ev_sales_view AS
SELECT
CASE WHEN
is_account_group_member(‘account users’) THEN mask(region)
ELSE region
END AS REGION,
parameter,
mode,
powertrain,
year,
unit,
value
FROM ev_sales;
View the data in SQL Editor:
Explore permissions:
Using SHOW GRANTS we can view the permissions for the user for the catalog, schemas, table, views or functions.
SHOW GRANTS ON VIEW
: This command displays the permissions that have been granted on the specific view. It helps identify who has access to view or modify the view’s data.
SHOW GRANTS ON TABLE
: This command reveals the permissions associated with the table. It shows which users or roles have specific access rights to the table, such as reading or writing data.
SHOW GRANTS ON SCHEMA
: This command lists the permissions granted on the schema. It provides insights into the access rights granted at the schema level, affecting all objects contained within it.
SHOW GRANTS ON CATALOG
: This command shows the permissions granted on the catalog. It outlines access rights at the catalog level, which can affect all schemas and objects within the catalog.
SHOW GRANTS ON FUNCTION
: This command displays the permissions for the function mask_function
. It identifies who has the rights to execute or modify this function.
SQL Script:
SHOW GRANTS ON VIEW ev_sales_view;
SHOW GRANTS ON TABLE ev_sales;
SHOW GRANTS ON SCHEMA vehical_sales;
SHOW GRANTS ON FUNCTION mask_function;
SHOW GRANTS ON CATALOG sales_calalog;
Revoke access:
No data governance platform would be complete without the ability to revoke previously issued grants.
SQL Script:
SHOW GRANTS ON FUNCTION mask_function;
REVOKE USAGE ON CATALOG sales_calalog FROM `account users`;
Conclusion
Unity Catalog offers a comprehensive and structured approach to managing and governing data objects within your data lakehouse. By creating distinct catalogs, schemas, tables, views, and user-defined functions, you can organize your data assets efficiently. The explicit permission model ensures robust access control, allowing you to grant specific permissions such as USAGE and SELECT to maintain data security and integrity. Utilizing dynamic views further enhances data protection by implementing row- and column-level security.
With Unity Catalog, you not only streamline the creation and management of data objects but also ensure that data governance policies are adhered to, providing a secure and compliant data environment. Embrace the power of Unity Catalog to achieve effective data governance, enabling your organization to harness the full potential of its data while maintaining control and security.
Please contact us for your Databricks solution requirements. Our solutions include a comprehensive range of services tailored to address your data integration and migration requirements.