Write modes | ContractForge

scd0_append

Append-only

Best for immutable landing data, event logs and raw captures. Use idempotency if batches may be replayed.

scd0_overwrite

Full replace

Best for small rebuildable tables or deterministic aggregates where replacing the full target is acceptable.

scd1_upsert

Current state

MERGE by keys. Requires clean `merge_keys` and usually `not_null` + `unique_key` quality rules.

scd1_hash_diff

Append changed states

Writes only rows whose hash changed compared with the latest known target state. Use deterministic ordering.

scd2_historical

Type 2 history

Expires current versions and inserts new versions when business columns change.

snapshot_soft_delete

Complete snapshot

Synchronizes full current state and marks missing active rows inactive. Requires a complete source.

Decision guide

Need	Recommended mode	Design note
Capture everything as received	`scd0_append`	Keep Bronze simple and auditable.
Rebuild a small aggregate	`scd0_overwrite`	Use when full recompute is cheaper than incremental logic.
Keep one row per key	`scd1_upsert`	Validate null and duplicate keys before MERGE.
Track current-state changes cheaply	`scd1_hash_diff`	Needs stable dedup ordering for the latest state.
Audit historical versions	`scd2_historical`	Restrict change columns to meaningful business attributes.
Deactivate records absent from source	`snapshot_soft_delete`	Do not combine with watermark or partial filters.

Mode requirements

The table below is the practical contract checklist. If a field is listed as required, validation should fail before the framework performs a dangerous or ambiguous write.

Mode	Required fields	Recommended quality rules	Do not use when
`scd0_append`	None beyond source and target.	`required_columns`, `not_null` for business keys, optional expressions.	The same batch can be replayed without an idempotency key.
`scd0_overwrite`	None beyond source and target.	`min_rows` and required columns to avoid replacing a table with an empty or malformed source.	The source is partial or not deterministic.
`scd1_upsert`	`merge_keys`.	`not_null` and `unique_key` on merge keys; deduplicate with deterministic order.	Multiple source rows can update the same target key.
`scd1_hash_diff`	`hash_keys` and deterministic latest-state order when history exists.	`not_null`, `unique_key` after dedup, accepted values for low-cardinality dimensions.	The target has multiple versions per key and no reliable ordering column.
`scd2_historical`	`merge_keys` or mode-specific keys, plus change columns when business comparison should be scoped.	Required keys, uniqueness after dedup, non-null effective date when used.	You only need current state or the source cannot produce stable business keys.
`snapshot_soft_delete`	`merge_keys` and source completeness evidence.	`not_null`, `unique_key`, `min_rows` for complete snapshots.	The source is filtered, incremental, watermarked or represents only a slice of current state.

Optimization and table layout

Write semantics and physical layout are separate choices. Keep the mode focused on business state and use layout settings to improve query performance or maintenance behavior.

Field	Use	Guidance
`partition_column`, `partition_value`	Scope overwrite/replace patterns to a physical partition.	Only use `partition_value` with `partition_column`. Do not confuse partition filtering with source completeness.
`merge_strategy`	Select the MERGE strategy used by merge modes.	Use the default unless the target is partitioned and the contract explicitly declares the partition strategy.
`merge_partition_column`	Partition key for partition-aware merge strategies.	Required for partition merge strategies.
`replace_partitions_source_complete`	Declares that the source contains the complete partition slice being replaced.	Use for partition replacement only when the source truly covers the full affected partition.
`cluster_columns`	Liquid clustering columns for Databricks tables.	Prefer for new Databricks-managed Delta tables when query patterns are known.
`zorder_columns`	Columns for explicit ZORDER after write.	Use with `optimize_after_write=true`; do not declare it as a passive hint.
`delta_properties`	Delta table properties at creation/update time.	Use for CDF, retention, auto optimize and table-level behavior that belongs with the target contract.

Snapshot soft delete

This mode is intentionally defensive. The source must represent the complete current state, otherwise the framework cannot tell if a target row disappeared or was simply outside the loaded slice.

mode: snapshot_soft_delete
merge_keys: [device_id]
source:
  type: connector
  connector: table
  table: main.raw.devices_snapshot
  read:
    source_complete: true

Operational note

Use incremental modes for partial sources. A snapshot with watermark is a contradiction and should fail before data is touched.

Merge safety

MERGE-based modes fail before writing when the source contains null or duplicate keys that would make the target update ambiguous. The guardrail is generic and protects every merge target from non-deterministic updates.

mode: scd1_upsert
merge_keys: [customer_id]
quality_rules:
  not_null: [customer_id]
  unique_key: [customer_id]
transform:
  deduplicate:
    keys: [customer_id]
    order_by: updated_at DESC NULLS LAST

Metrics by mode

Each mode returns logical metrics even when Delta history does not expose the same operation counters across runtimes. Use metrics_source to understand whether values came from logical calculations, Delta operation metrics or a mix.

Mode	Important metrics	Dashboard interpretation
`scd0_append`	`rows_inserted`, `rows_written`	Batch size and ingestion throughput.
`scd1_upsert`	`rows_inserted`, `rows_updated`	Change ratio in current-state tables.
`scd1_hash_diff`	`rows_written`, hash-change counts when available	Whether source batches are actually changing target state.
`scd2_historical`	expired versions and inserted versions	Historical churn and slowly changing dimension behavior.
`snapshot_soft_delete`	inserted, updated, soft-deleted rows	Completeness and unexpected disappearance monitoring.

Mode examples

# SCD1 current-state table
source:
  type: connector
  connector: jdbc
  options:
    url: "{{ secret:crm/jdbc_url }}"
    dbtable: public.customers
    driver: org.postgresql.Driver
target:
  catalog: main
  schema: crm_curated
  table: s_customers
layer: silver
mode: scd1_upsert
merge_keys: [customer_id]
transform:
  deduplicate:
    keys: [customer_id]
    order_by: updated_at DESC NULLS LAST
quality_rules:
  not_null: [customer_id]
  unique_key: [customer_id]

from contractforge import ingest

result = ingest(
    source={
        "type": "connector",
        "connector": "jdbc",
        "options": {
            "url": "{{ secret:crm/jdbc_url }}",
            "dbtable": "public.customers",
            "driver": "org.postgresql.Driver",
        },
    },
    target_catalog="main",
    target_schema="crm_curated",
    target_table="s_customers",
    layer="silver",
    mode="scd1_upsert",
    merge_keys=["customer_id"],
    transform={
        "deduplicate": {
            "keys": ["customer_id"],
            "order_by": "updated_at DESC NULLS LAST",
        },
    },
    quality_rules={
        "not_null": ["customer_id"],
        "unique_key": ["customer_id"],
    },
)

Raise-on-failure behavior

Failed writes are not silent payloads by default. The API writes run evidence and then raises ContractForgeExecutionError, so Databricks Jobs and notebooks fail naturally.

from contractforge import ContractForgeExecutionError, ingest

try:
    result = ingest(...)
except ContractForgeExecutionError as exc:
    display(exc.result)
    raise