Skip to main content

Databricks adapter

contractforge-databricks is the first full ContractForge adapter. It depends on contractforge-core and keeps Databricks-specific code outside the semantic core.

Install

pip install contractforge-core contractforge-databricks

On Databricks, install both wheels on the job, cluster, notebook environment or workspace path. Do not install Spark into the core; Databricks Runtime provides Spark and Delta.

Public entry points

from contractforge_databricks import (
deploy_databricks_project,
render_databricks_contract,
ingest_databricks_bundle,
)

Use render_databricks_contract to produce native artifacts and review output. Use ingest_databricks_bundle when the Databricks workspace should execute the contract directly. Use deploy_databricks_project or the CLI when a repository project already contains a Databricks Asset Bundle and should be validated, deployed and optionally run through Databricks native deployment.

Native capabilities

ContractForge areaDatabricks implementation
TablesDelta tables in Unity Catalog or configured metastore.
Incremental filesAuto Loader / cloudFiles and available-now streaming where configured.
WritesDelta append, overwrite, merge/upsert, hash-diff current-state, historical and snapshot soft-delete modes.
GovernanceUnity Catalog comments, tags, grants, row filters and column masks where supported.
EvidenceDelta control tables following the core evidence model.
RenderingSQL, Python, Databricks Asset Bundles, Lakeflow planning and Markdown review reports.
RuntimeDatabricks jobs, notebooks, serverless or classic clusters depending on environment configuration.

Adapter extensions

Platform-specific settings go under extensions.databricks or the environment contract. They do not become top-level core fields.

extensions:
databricks:
delta_properties:
delta.enableChangeDataFeed: "true"
cluster_columns: [customer_id]
partition_columns: [ingestion_date]

Use adapter extensions only for Databricks behavior. Portable concepts such as target identity, write mode, quality, schema policy and source intent stay in the core contract.

Logical table refs

When a Databricks contract reads a table produced by another ContractForge contract, prefer the portable ref instead of hard-coding the Unity Catalog name:

source:
type: table
ref: bronze.b_products_jdbc

For SQL:

FROM {{ table_ref:silver.s_product_tags }}

The Databricks adapter resolves these to the target catalog/schema naming used by the project. AWS resolves the same refs to Glue Catalog/Iceberg names.

Runtime example

from contractforge_databricks import ingest_databricks_bundle

result = ingest_databricks_bundle(
"/Workspace/Shared/contracts/silver/s_customers",
options={
"catalog": "main",
"schema": "ops",
"notebook_name": "jobs/silver/s_customers",
},
)

The adapter creates control tables when they do not exist, writes the target table, applies available annotations/operations/access behavior and records evidence before returning the result.

Deployment

Databricks deployment is adapter-owned and DAB-based. The core does not deploy jobs and does not import Databricks SDKs.

contractforge-databricks deploy-project examples/real-world/supabase-jdbc-medallion/project.yaml \
--profile dbc-dev \
--target dev

Add --run to execute the deployed bundle job:

contractforge-databricks deploy-project examples/real-world/supabase-jdbc-medallion/project.yaml \
--profile dbc-dev \
--target dev \
--run

For projects without project.yaml, point directly at the DAB directory or databricks.yml:

contractforge-databricks deploy-bundle ./databricks.yml --profile dbc-dev --target dev --run

The project command reads validation.databricks.bundle from project.yaml. If that field is absent, it defaults to databricks.yml beside the project file. The adapter calls the Databricks CLI with explicit arguments; it never builds shell command strings.

Project scheduling

Scheduling belongs to project.yaml, not to the ingestion contract. The contract says what a dataset means; the project says how platform jobs are connected.

schedule:
cron: "0 6 * * *"
timezone: America/Sao_Paulo
enabled: false
max_concurrent_runs: 1
queue: true
adapters:
databricks:
pause_status: PAUSED
tasks:
bronze_orders:
task_key: bronze_orders

execution_order:
- name: bronze_orders
contracts:
databricks: contracts/databricks/bronze_orders.ingestion.yaml
- name: silver_orders
depends_on: [bronze_orders]
contracts:
databricks: contracts/databricks/silver_orders.ingestion.yaml

Render a Databricks Asset Bundle from that metadata:

contractforge-databricks render-project-bundle project.yaml \
--output databricks.yml \
--force

Or render and deploy in one adapter-owned flow:

contractforge-databricks deploy-project project.yaml \
--render-bundle \
--force-render \
--target dev

The renderer maps depends_on to Databricks task dependencies and maps top-level schedule.cron / schedule.timezone to the Databricks Jobs schedule block. AWS maps the same project schedule to EventBridge Scheduler without changing ingestion contracts.

Review-required examples

Databricks can support many semantics natively, but some requests still need explicit review:

Contract requestWhy review may be required
historical across non-Delta source snapshotsCompleteness and delete semantics must be proven.
Row filters/masksPrivileges and Unity Catalog feature availability vary by workspace.
Available-now streamingCheckpoint and schema locations must be governed and durable.
Native Lakeflow connectorsSource-specific behavior is Databricks-owned and may not be portable.

The adapter should return review diagnostics rather than hiding those decisions in generated code.