Skip to main content

Connection YAML

Connection YAML files let several ingestion contracts share connector defaults without hiding dataset-specific intent inside a global configuration file.

Use them for source settings that are genuinely shared:

  • endpoint or JDBC URL;
  • connector type and driver;
  • authentication shape and secret references;
  • common read defaults such as fetchsize;
  • source system labels used in evidence.

Do not use them for dataset semantics such as table name, query, watermark, partition bounds, quality rules, write mode, target table or access policy. Those belong in the ingestion contract.

Project inventory

project.yaml may list reusable connection files so humans and tooling can see which connections belong to the project:

connections:
supabase_postgres: connections/supabase.yaml

This inventory does not apply a connection by itself. The ingestion contract chooses the connection through source.type: connection.

Connection file

# connections/supabase.yaml
type: connector
connector: postgres
system: supabase_inventory_demo
options:
url: "{{ secret:supabase/jdbc_url }}"
driver: org.postgresql.Driver
auth:
type: username_password
username: "{{ secret:supabase/user }}"
password: "{{ secret:supabase/password }}"
read:
fetchsize: 10000

The same file may also wrap the payload under source: if a repository prefers that style:

source:
type: connector
connector: postgres
system: supabase_inventory_demo
options:
url: "{{ secret:supabase/jdbc_url }}"
driver: org.postgresql.Driver

Ingestion override

The ingestion contract points to the connection file and declares the dataset-specific fields:

source:
type: connection
connection_path: project://connections/supabase.yaml
table: public.products
read:
fetchsize: 20000
partition_column: product_id
lower_bound: 1
upper_bound: 100000
num_partitions: 8

target:
catalog: contractforge
schema: bronze
table: b_products

mode: hash_diff_upsert
merge_keys: [product_id]

ContractForge loads the connection first, then deep-merges the ingestion source overrides on top.

Merge rule:

LocationRole
Connection YAMLShared defaults.
Ingestion sourceDataset-specific overrides. These values win.

For the example above, the resolved source received by the adapter is equivalent to:

type: connector
connector: postgres
system: supabase_inventory_demo
options:
url: "{{ secret:supabase/jdbc_url }}"
driver: org.postgresql.Driver
auth:
type: username_password
username: "{{ secret:supabase/user }}"
password: "{{ secret:supabase/password }}"
read:
fetchsize: 20000
partition_column: product_id
lower_bound: 1
upper_bound: 100000
num_partitions: 8
table: public.products
connection: project://connections/supabase.yaml

read.fetchsize changed from 10000 to 20000 because the ingestion contract overrode the global default. Other read fields were merged in. The adapter does not need to know where the YAML lived; it receives this resolved source.

Path rules

Prefer project-rooted references:

source:
type: connection
connection_path: project://connections/supabase.yaml

Same-bundle relative paths are allowed only when the connection file is inside the ingestion bundle directory. Absolute paths and .. traversal are rejected.

These rules prevent a contract from reading arbitrary files from the machine or from another tenant/project.

Adapter behavior

Connection YAML is a core loading concern, not a platform runtime feature.

Databricks, AWS and future adapters all receive the resolved source payload. They should record the effective connector metadata in evidence, but they should not re-read the connection YAML or implement their own merge behavior.

Checklist

  • Put endpoint, auth, driver and common read defaults in the connection file.
  • Put table, query, partition bounds, watermark and dataset-specific reads in the ingestion contract.
  • Let ingestion override the global connection when a dataset needs a different value.
  • Keep secrets as secret references.
  • Use project://connections/... for centralized project connection files.