Connector overview
ContractForge models source intent in the core and lets adapters choose the native implementation. This keeps common source concepts portable without pretending that every platform has the same connector runtime.
Source portability classes
| Class | Examples | ContractForge behavior |
|---|---|---|
| Portable source intent | table, view, sql, csv, json, parquet, orc, text, avro, jdbc, http_file, object_storage, incremental_files | The core can validate the intent and adapters map it to native readers. |
| Bounded streaming | kafka_bounded, eventhubs_bounded | Catch-up or bounded replay semantics, not an implicit continuous streaming promise. |
| Engine-specific | Databricks autoloader, AWS glue_bookmark, Fabric dataflow_gen2_* | Valid only for the matching adapter. Other adapters return diagnostics. |
| Native passthrough | Salesforce, Workday, SAP, SharePoint, managed SaaS connectors | The contract records intent, while the adapter delegates to native platform/vendor systems. |
Reusable connection YAML
Many ingestion contracts share the same connection details. Use source.type: connection to inherit a centralized connection file and keep each dataset contract focused on table-specific intent.
See Connection YAML for the full reference, including merge order, override examples and path safety rules.
Dataset contract:
source:
type: connection
connection_path: project://connections/supabase_postgres.connection.yaml
table: public.products
target:
catalog: main
schema: bronze
table: b_products
mode: hash_diff_upsert
merge_keys: [product_id]
hash_strategy: all_columns_except
hash_exclude_columns: [updated_at]
Connection file:
source:
type: postgres
system: supabase_postgres
options:
url: "{{ secret:supabase/jdbc_url }}"
driver: org.postgresql.Driver
auth:
type: username_password
username: "{{ secret:supabase/user }}"
password: "{{ secret:supabase/password }}"
read:
fetchsize: 10000
The loader resolves project://connections/... from the nearest project.yaml
root, loads the connection file first, then deep-merges the ingestion source
fields on top. Ingestion fields win, including nested values such as
read.fetchsize or options.driver. Absolute paths and .. traversal are
rejected. The adapter receives one resolved source specification and records the
effective source metadata in evidence.
Standard source fields
| Field | Purpose |
|---|---|
type | Source type or portability class. |
connection_path | project://connections/... or a same-bundle relative path to a reusable connection YAML when type: connection. |
connector | Optional connector identifier for type: connector compatibility. |
system | Source system label recorded in evidence. |
ref, table_ref | Logical layer.table reference to a table produced by another ContractForge contract. |
table, query, path, format | Source identity. |
options | Reader options or non-secret connector settings. |
auth | Secret-backed authentication settings. Values are redacted in evidence. |
read | Framework-aware read controls such as partitioning, fetch size, schema, checkpoint or completeness markers. |
request, pagination, response, limits | HTTP/REST source controls. |
Adapter mapping examples
| Contract source | Databricks adapter | AWS adapter target | Fabric adapter target |
|---|---|---|---|
incremental_files | Auto Loader / cloudFiles | Glue bookmarks or streaming job | Dataflow/Pipeline incremental pattern |
jdbc / postgres | Spark JDBC with configured driver and optional RDS IAM token | Glue JDBC / EMR Spark JDBC | Dataflow Gen2 or Spark JDBC where available |
table / sql | Spark SQL / Unity Catalog table | Glue Catalog, Athena/Iceberg or EMR SQL | Lakehouse table or SQL endpoint |
native_passthrough Salesforce | Lakeflow Connect Salesforce | AppFlow Salesforce | Dataflow Gen2 connector |
These mappings are adapter responsibilities. The core only preserves the source intent and validates portability boundaries.
Logical downstream refs
For medallion projects, prefer logical references when one contract reads the output of another:
source:
type: table
ref: bronze.b_products_jdbc
SQL sources can use placeholders:
FROM {{ table_ref:silver.s_product_tags }}
Databricks resolves the reference to its catalog/schema/table name. AWS resolves it to Glue Catalog/Iceberg. The core does not know either platform's qualifier.
Production checklist
- Keep secrets in secret references, not literal YAML values.
- Use reusable connection YAML for shared endpoints.
- Mark source completeness when using overwrite, snapshot or delete-aware modes.
- Declare bounded streaming explicitly; do not imply continuous streaming unless the adapter supports it.
- Use adapter-specific source types only when portability is not required.
- Check control/evidence tables after the first run to confirm source metadata, connector, row counts and redaction.