Declarative contracts
Describe sources, write modes, keys, watermarks, quality gates, schema policy and transformation intent in YAML or Python.
Databricks · Delta Lake · Declarative ingestion
ContractForge turns recurring Lakehouse ingestion patterns into reviewed YAML or Python contracts: connectors, write modes, quality rules, schema evolution, transformations, governance metadata and control-table observability.
Standardize how teams ingest data without forcing every notebook to reimplement Spark, Delta MERGE, quality gates and audit logging.
Why it exists
ContractForge is not a DAG scheduler and it is not a black box. It is a focused ingestion framework that keeps Spark and Delta semantics visible while making them repeatable, validated and observable.
Describe sources, write modes, keys, watermarks, quality gates, schema policy and transformation intent in YAML or Python.
Runs, errors, quality, quarantine, schema changes, streams, lineage, locks and operational metadata are persisted as Delta tables.
Read from tables, SQL, files, object storage, S3, Azure Blob, JDBC, REST APIs, HTTP files, Auto Loader and external Spark connectors.
Split ingestion, annotations, operations and access contracts so engineering, stewardship and security can evolve independently.
Execution model
Each run follows the same control flow: load a source, normalize the DataFrame, validate the contract, apply quality gates, write through an explicit mode and record evidence.
dry_run to validate intent without writing data.transform.shape for JSON, structs, arrays and declarative projections.Mental model
A table is not only an ingestion script. It also has catalog metadata, operational ownership and access rules. ContractForge keeps those responsibilities explicit.
Source, target, mode, keys, quality, schema, transformations, performance options and idempotency.
Table and column descriptions, tags, aliases, PII classification and deprecation metadata.
Business owner, technical owner, support group, criticality, SLA, runbook and alerting intent.
Grants, masks and row filters are validated and applied by dedicated commands, not by normal ingestion.
Common paths
Recommended next step
The fastest way to understand ContractForge is to execute one minimal contract and query `ctrl_ingestion_runs`, `ctrl_ingestion_quality` and `ctrl_ingestion_errors`.
Practical patterns
The examples cover public HTTP CSV, raw REST JSON, Azure Blob/ADLS, S3, JDBC/Postgres, RDS IAM, Auto Loader and many-file folders. They are written to show when behavior belongs in the connector, in transform.shape, or in downstream project logic.
REST connectors retrieve payloads; transform.shape parses and explodes documents declaratively.
Serverless favors External Locations and Volumes; classic clusters can use direct credentials when allowed.
Control tables are written first, then ContractForgeExecutionError fails the caller naturally.
Templates provide concise starting points for common connector, transform and write-mode combinations.