Databricks adapter
contractforge-databricks is the first full ContractForge adapter. It depends on contractforge-core and keeps Databricks-specific code outside the semantic core.
Install
pip install contractforge-core contractforge-databricks
On Databricks, install both wheels on the job, cluster, notebook environment or workspace path. Do not install Spark into the core; Databricks Runtime provides Spark and Delta.
Public entry points
from contractforge_databricks import (
deploy_databricks_project,
render_databricks_contract,
ingest_databricks_bundle,
)
Use render_databricks_contract to produce native artifacts and review output. Use ingest_databricks_bundle when the Databricks workspace should execute the contract directly.
Use deploy_databricks_project or the CLI when a repository project already
contains a Databricks Asset Bundle and should be validated, deployed and
optionally run through Databricks native deployment.
Native capabilities
| ContractForge area | Databricks implementation |
|---|---|
| Tables | Delta tables in Unity Catalog or configured metastore. |
| Incremental files | Auto Loader / cloudFiles and available-now streaming where configured. |
| Writes | Delta append, overwrite, merge/upsert, hash-diff current-state, historical and snapshot soft-delete modes. |
| Governance | Unity Catalog comments, tags, grants, row filters and column masks where supported. |
| Evidence | Delta control tables following the core evidence model. |
| Rendering | SQL, Python, Databricks Asset Bundles, Lakeflow planning and Markdown review reports. |
| Runtime | Databricks jobs, notebooks, serverless or classic clusters depending on environment configuration. |
Adapter extensions
Platform-specific settings go under extensions.databricks or the environment contract. They do not become top-level core fields.
extensions:
databricks:
delta_properties:
delta.enableChangeDataFeed: "true"
cluster_columns: [customer_id]
partition_columns: [ingestion_date]
Use adapter extensions only for Databricks behavior. Portable concepts such as target identity, write mode, quality, schema policy and source intent stay in the core contract.
Logical table refs
When a Databricks contract reads a table produced by another ContractForge contract, prefer the portable ref instead of hard-coding the Unity Catalog name:
source:
type: table
ref: bronze.b_products_jdbc
For SQL:
FROM {{ table_ref:silver.s_product_tags }}
The Databricks adapter resolves these to the target catalog/schema naming used by the project. AWS resolves the same refs to Glue Catalog/Iceberg names.
Runtime example
from contractforge_databricks import ingest_databricks_bundle
result = ingest_databricks_bundle(
"/Workspace/Shared/contracts/silver/s_customers",
options={
"catalog": "main",
"schema": "ops",
"notebook_name": "jobs/silver/s_customers",
},
)
The adapter creates control tables when they do not exist, writes the target table, applies available annotations/operations/access behavior and records evidence before returning the result.
Deployment
Databricks deployment is adapter-owned and DAB-based. The core does not deploy jobs and does not import Databricks SDKs.
contractforge-databricks deploy-project examples/real-world/supabase-jdbc-medallion/project.yaml \
--profile dbc-dev \
--target dev
Add --run to execute the deployed bundle job:
contractforge-databricks deploy-project examples/real-world/supabase-jdbc-medallion/project.yaml \
--profile dbc-dev \
--target dev \
--run
For projects without project.yaml, point directly at the DAB directory or
databricks.yml:
contractforge-databricks deploy-bundle ./databricks.yml --profile dbc-dev --target dev --run
The project command reads validation.databricks.bundle from project.yaml.
If that field is absent, it defaults to databricks.yml beside the project
file. The adapter calls the Databricks CLI with explicit arguments; it never
builds shell command strings.
Project scheduling
Scheduling belongs to project.yaml, not to the ingestion contract. The
contract says what a dataset means; the project says how platform jobs are
connected.
schedule:
cron: "0 6 * * *"
timezone: America/Sao_Paulo
enabled: false
max_concurrent_runs: 1
queue: true
adapters:
databricks:
pause_status: PAUSED
tasks:
bronze_orders:
task_key: bronze_orders
execution_order:
- name: bronze_orders
contracts:
databricks: contracts/databricks/bronze_orders.ingestion.yaml
- name: silver_orders
depends_on: [bronze_orders]
contracts:
databricks: contracts/databricks/silver_orders.ingestion.yaml
Render a Databricks Asset Bundle from that metadata:
contractforge-databricks render-project-bundle project.yaml \
--output databricks.yml \
--force
Or render and deploy in one adapter-owned flow:
contractforge-databricks deploy-project project.yaml \
--render-bundle \
--force-render \
--target dev
The renderer maps depends_on to Databricks task dependencies and maps
top-level schedule.cron / schedule.timezone to the Databricks Jobs schedule
block. AWS maps the same project schedule to EventBridge Scheduler without
changing ingestion contracts.
Review-required examples
Databricks can support many semantics natively, but some requests still need explicit review:
| Contract request | Why review may be required |
|---|---|
| historical across non-Delta source snapshots | Completeness and delete semantics must be proven. |
| Row filters/masks | Privileges and Unity Catalog feature availability vary by workspace. |
| Available-now streaming | Checkpoint and schema locations must be governed and durable. |
| Native Lakeflow connectors | Source-specific behavior is Databricks-owned and may not be portable. |
The adapter should return review diagnostics rather than hiding those decisions in generated code.