Data Validation (Databricks)
Configuration reference for the Data Validation block — runs a Databricks notebook to clean, validate, and transform source data before ingestion.
The Data Validation block is an optional middle step that applies custom data quality rules using a Databricks notebook before data reaches the Transform Data block. It is available in both DIY and standard templates.
Block type: Databricks integration
Runs a Databricks job to clean, validate, and transform source data. Reads the input file into a temporary Databricks table, applies notebook-defined logic, and writes results to an output table consumed by downstream blocks.
Configuration fields
| Field | Description |
|---|---|
| Databricks Job ID * | The ID of the Databricks job (notebook) to execute. Contact the Databricks team for this value |
| Additional Parameters | Optional key-value parameters passed to the Databricks job at runtime |
* Required field
Key capabilities
- Filter records based on conditions (e.g., retain only records where
status = 'active') - Remove duplicate rows
- Fill missing values with defaults or randomly generated placeholders
- Enrich records by joining with reference data from other Databricks tables
- Rename column headers to match downstream format requirements
- Apply custom validation rules (format checks, uniqueness enforcement, etc.)
- Transform and standardize field values
Temporary input and output tables created by the Data Validation block are automatically deleted after 10 days. The block is designed for data preparation, not persistent storage.
Updated about 10 hours ago
