Skip to main content

Connector overview

ContractForge models source intent in the core and lets adapters choose the native implementation. This keeps common source concepts portable without pretending that every platform has the same connector runtime.

Source portability classes

ClassExamplesContractForge behavior
Portable source intenttable, view, sql, csv, json, parquet, orc, text, avro, jdbc, http_file, object_storage, incremental_filesThe core can validate the intent and adapters map it to native readers.
Bounded streamingkafka_bounded, eventhubs_boundedCatch-up or bounded replay semantics, not an implicit continuous streaming promise.
Engine-specificDatabricks autoloader, AWS glue_bookmark, Fabric dataflow_gen2_*Valid only for the matching adapter. Other adapters return diagnostics.
Native passthroughSalesforce, Workday, SAP, SharePoint, managed SaaS connectorsThe contract records intent, while the adapter delegates to native platform/vendor systems.

Reusable connection YAML

Many ingestion contracts share the same connection details. Use source.type: connection to inherit a centralized connection file and keep each dataset contract focused on table-specific intent.

See Connection YAML for the full reference, including merge order, override examples and path safety rules.

Dataset contract:

source:
type: connection
connection_path: project://connections/supabase_postgres.connection.yaml
table: public.products

target:
catalog: main
schema: bronze
table: b_products

mode: hash_diff_upsert
merge_keys: [product_id]
hash_strategy: all_columns_except
hash_exclude_columns: [updated_at]

Connection file:

source:
type: postgres
system: supabase_postgres
options:
url: "{{ secret:supabase/jdbc_url }}"
driver: org.postgresql.Driver
auth:
type: username_password
username: "{{ secret:supabase/user }}"
password: "{{ secret:supabase/password }}"
read:
fetchsize: 10000

The loader resolves project://connections/... from the nearest project.yaml root, loads the connection file first, then deep-merges the ingestion source fields on top. Ingestion fields win, including nested values such as read.fetchsize or options.driver. Absolute paths and .. traversal are rejected. The adapter receives one resolved source specification and records the effective source metadata in evidence.

Standard source fields

FieldPurpose
typeSource type or portability class.
connection_pathproject://connections/... or a same-bundle relative path to a reusable connection YAML when type: connection.
connectorOptional connector identifier for type: connector compatibility.
systemSource system label recorded in evidence.
ref, table_refLogical layer.table reference to a table produced by another ContractForge contract.
table, query, path, formatSource identity.
optionsReader options or non-secret connector settings.
authSecret-backed authentication settings. Values are redacted in evidence.
readFramework-aware read controls such as partitioning, fetch size, schema, checkpoint or completeness markers.
request, pagination, response, limitsHTTP/REST source controls.

Adapter mapping examples

Contract sourceDatabricks adapterAWS adapter targetFabric adapter target
incremental_filesAuto Loader / cloudFilesGlue bookmarks or streaming jobDataflow/Pipeline incremental pattern
jdbc / postgresSpark JDBC with configured driver and optional RDS IAM tokenGlue JDBC / EMR Spark JDBCDataflow Gen2 or Spark JDBC where available
table / sqlSpark SQL / Unity Catalog tableGlue Catalog, Athena/Iceberg or EMR SQLLakehouse table or SQL endpoint
native_passthrough SalesforceLakeflow Connect SalesforceAppFlow SalesforceDataflow Gen2 connector

These mappings are adapter responsibilities. The core only preserves the source intent and validates portability boundaries.

Logical downstream refs

For medallion projects, prefer logical references when one contract reads the output of another:

source:
type: table
ref: bronze.b_products_jdbc

SQL sources can use placeholders:

FROM {{ table_ref:silver.s_product_tags }}

Databricks resolves the reference to its catalog/schema/table name. AWS resolves it to Glue Catalog/Iceberg. The core does not know either platform's qualifier.

Production checklist

  • Keep secrets in secret references, not literal YAML values.
  • Use reusable connection YAML for shared endpoints.
  • Mark source completeness when using overwrite, snapshot or delete-aware modes.
  • Declare bounded streaming explicitly; do not imply continuous streaming unless the adapter supports it.
  • Use adapter-specific source types only when portability is not required.
  • Check control/evidence tables after the first run to confirm source metadata, connector, row counts and redaction.