How masking is enforced in Databricks

Data masking in Databricks applies differently depending on how the data is accessed and which cluster is used.

The following sections explain how masking works for

  • user tables,
  • jobs and exports, and
  • historical (Type-2) data.
Access scenarioNon-USHC clustersUSHC cluster
User (customer) tablesMasking is enforced through read_api tables. PSI-marked fields return masked values (*****). Base tables contain unmasked data. Admin users can query admin_read_api tables to access unmasked data. Non-admin users always see masked dataMasking is enforced through Databricks access groups. PSI-marked fields return masked values (*****) unless the user belongs to a PSI access group. Admin privileges alone do not grant access. Users must be added to a Databricks PSI access group via a ticket-based process.
Jobs and exportsJobs can access unmasked data only when run using admin access. Personal user accounts should not be used.Jobs must run using service principals. PSI-enabled service principals can access unmasked data; non-PSI service principals see only masked data.
Type-2 (historical) user dataMasking applies to current user data only. Type-2 handling is not applicable.Customer identifiers and sensitive fields in Type-2 (history-tracking) user dimension tables are masked.

Enabling masking

PSI masking is enabled at the brand (organization) level.