Databricks · Delta Lake · Declarative ingestion

Data ingestion defined as contracts, executed with operational evidence.

ContractForge turns recurring Lakehouse ingestion patterns into reviewed YAML or Python contracts: connectors, write modes, quality rules, schema evolution, transformations, governance metadata and control-table observability.

Current focus Production-grade internal data platforms

Standardize how teams ingest data without forcing every notebook to reimplement Spark, Delta MERGE, quality gates and audit logging.

Python 3.10+ Databricks Delta Lake MIT

Why it exists

Contract-first ingestion without hiding Spark

ContractForge is not a DAG scheduler and it is not a black box. It is a focused ingestion framework that keeps Spark and Delta semantics visible while making them repeatable, validated and observable.

Declarative contracts

Describe sources, write modes, keys, watermarks, quality gates, schema policy and transformation intent in YAML or Python.

Operational control tables

Runs, errors, quality, quarantine, schema changes, streams, lineage, locks and operational metadata are persisted as Delta tables.

Real connectors

Read from tables, SQL, files, object storage, S3, Azure Blob, JDBC, REST APIs, HTTP files, Auto Loader and external Spark connectors.

Governance by design

Split ingestion, annotations, operations and access contracts so engineering, stewardship and security can evolve independently.

Execution model

One predictable path for every table

Each run follows the same control flow: load a source, normalize the DataFrame, validate the contract, apply quality gates, write through an explicit mode and record evidence.

  • Use dry_run to validate intent without writing data.
  • Use transform.shape for JSON, structs, arrays and declarative projections.
  • Use source connectors to avoid ad-hoc download code in notebooks.
  • Use control tables as the foundation for dashboards and alerts.
flowchart TB A["Contract
YAML or Python"] --> B["Plan validation
presets + defaults"] B --> C["Source connector
table, files, API, JDBC"] C --> D["Transform
mapping, shape, dedup"] D --> E["Quality + schema
fail, warn, quarantine"] E --> F["Writer mode
append, merge, SCD2, snapshot"] F --> G["Delta target"] F --> H["Control tables
runs, errors, lineage, quality"]

Mental model

Contracts are separated by responsibility

A table is not only an ingestion script. It also has catalog metadata, operational ownership and access rules. ContractForge keeps those responsibilities explicit.

ingestion.yaml

Data engineering

Source, target, mode, keys, quality, schema, transformations, performance options and idempotency.

annotations.yaml

Catalog context

Table and column descriptions, tags, aliases, PII classification and deprecation metadata.

operations.yaml

Run ownership

Business owner, technical owner, support group, criticality, SLA, runbook and alerting intent.

access.yaml

Security workflow

Grants, masks and row filters are validated and applied by dedicated commands, not by normal ingestion.

Common paths

Start with the connector that matches the source

Recommended next step

Run a small ingestion, then inspect the control tables.

The fastest way to understand ContractForge is to execute one minimal contract and query `ctrl_ingestion_runs`, `ctrl_ingestion_quality` and `ctrl_ingestion_errors`.

Open quick start

Practical patterns

Documentation focuses on source patterns users actually need.

The examples cover public HTTP CSV, raw REST JSON, Azure Blob/ADLS, S3, JDBC/Postgres, RDS IAM, Auto Loader and many-file folders. They are written to show when behavior belongs in the connector, in transform.shape, or in downstream project logic.

APIs

Raw payloads stay raw

REST connectors retrieve payloads; transform.shape parses and explodes documents declaratively.

Storage

Runtime-aware access

Serverless favors External Locations and Volumes; classic clusters can use direct credentials when allowed.

Operations

Failed runs raise by default

Control tables are written first, then ContractForgeExecutionError fails the caller naturally.

Templates

Reusable starts

Templates provide concise starting points for common connector, transform and write-mode combinations.