How masking is enforced in Databricks
Data masking in Databricks applies differently depending on how the data is accessed and which cluster is used.
The following sections explain how masking works for
- user tables,
- jobs and exports, and
- historical (Type-2) data.
| Access scenario | Non-USHC clusters | USHC cluster |
|---|---|---|
| User (customer) tables | Masking is enforced through read_api tables. PSI-marked fields return masked values (*****). Base tables contain unmasked data. Admin users can query admin_read_api tables to access unmasked data. Non-admin users always see masked data | Masking is enforced through Databricks access groups. PSI-marked fields return masked values (*****) unless the user belongs to a PSI access group. Admin privileges alone do not grant access. Users must be added to a Databricks PSI access group via a ticket-based process. |
| Jobs and exports | Jobs can access unmasked data only when run using admin access. Personal user accounts should not be used. | Jobs must run using service principals. PSI-enabled service principals can access unmasked data; non-PSI service principals see only masked data. |
| Type-2 (historical) user data | Masking applies to current user data only. Type-2 handling is not applicable. | Customer identifiers and sensitive fields in Type-2 (history-tracking) user dimension tables are masked. |
Enabling masking
PSI masking is enabled at the brand (organization) level.
- In USHC, create a JIRA ticket to the Capillary Product Support team to enable group-based access.
- For non-USHC clusters, the masking can be configured through Razor UI.
Updated about 10 hours ago
