How Masking is Enforced in Databricks

Data masking in Databricks applies differently depending on how the data is accessed and which cluster is used.

The following sections explain how masking works for

user tables,
jobs and exports, and
historical (Type-2) data.

Access scenario	Non-USHC clusters	USHC cluster
User (customer) tables	Masking is enforced through `read_api` tables. PSI-marked fields return masked values (`*****`). Base tables contain unmasked data. Admin users can query `admin_read_api` tables to access unmasked data. Non-admin users always see masked data	Masking is enforced through Databricks access groups. PSI-marked fields return masked values (`*****`) unless the user belongs to a PSI access group. Admin privileges alone do not grant access. Users must be added to a Databricks PSI access group via a ticket-based process.
Jobs and exports	Jobs can access unmasked data only when run using admin access. Personal user accounts should not be used.	Jobs must run using service principals. PSI-enabled service principals can access unmasked data; non-PSI service principals see only masked data.
Type-2 (historical) user data	Masking applies to current user data only. Type-2 handling is not applicable.	Customer identifiers and sensitive fields in Type-2 (history-tracking) user dimension tables are masked.

Enabling masking

PSI masking is enabled at the brand (organization) level.

In USHC, create a JIRA ticket to the Capillary Product Support team to enable group-based access.
For non-USHC clusters, the masking can be configured through Razor UI.

Updated about 1 month ago