Data Management

Templates for data reconciliation, PII cleanup, customer status updates, and triggering Databricks jobs.

Data management templates handle data quality, compliance, reconciliation, and administrative setup within the Capillary platform. Each template provides a pre-configured block sequence for a specific use case. Select a template, configure the block details for your environment, and deploy the dataflow.

The following templates are available for data management.

Data reconciliation (SFTP)

Compares source data from an FTP or SFTP location against the data in the Insights+ backend to identify events that were missed during ingestion into Capillary. The template marks missing records in the output file and stores the results in a configured directory for further action.

In the output file, the template adds a column called CAP_API_STATUS. Missing records are marked with 0 and existing records are marked with 1. You can use the output file with a transaction add or customer add template to re-ingest the missing records.

📘

Note:

Data in the Insights+ backend has a one-day delay because the ETL process runs once every night. Schedule the reconciliation trigger at least 12 to 24 hours after the ETL completes to ensure new data is not incorrectly marked as missing.

Use case

On January 1st, 2022, 100 transaction events were expected to be recorded in the Capillary platform. Due to an integration issue, only 90 events were successfully ingested, leaving 10 missing. Using this template, the brand compares the source data file against the Insights+ backend, identifies the 10 missing events, and uses the reconciliation output to re-ingest them without requiring a full data export from the platform.

Block configuration

The following table lists the blocks in the Data reconciliation (SFTP) template, describes what each block does, and provides the configured values for each field.

Block NameConfiguration FieldConfigured Value
Connect-to-source
Type: credential_aware_ftp_listing
Hostdata.capillarydata.com
Usernamenull
PasswordRedacted
Source path/tmp/
Filename pattern.*.csv. Matches all CSV files.
Processed path/
Port21
Ok-file
Type: ok_file_3
Map-fields-for-reconciliation
Type: diff_tool
Filter-data
Type: filter_on_condition
Filter condition${header_value:notNull()}
Reconciliation-job
Type: databricks_job_trigger_and_status