Merge strategies.
upsert supports different strategies for current-state tables. Choose the strategy that matches source completeness and partition safety.
Strategy comparison
| Strategy | Use when | Risk |
|---|---|---|
delta | Generic key-based merge. | Can scan more target data on large tables. |
delta_by_partition | The merge can safely restrict target partitions. | Requires correct merge_partition_column. |
replace_partitions | The source is complete for the partitions being replaced. | Data loss if source completeness is wrong. |
replace_partitions is implemented with Databricks SQL INSERT INTO ... REPLACE WHERE ... SELECT.
It is not a simulated delete-and-insert workflow and it is not a generic Spark compatibility path.
Examples
- YAML
- Python
mode: upsert
merge_strategy: delta
merge_keys: [order_id]
result = ingest(..., mode="upsert", merge_strategy="delta", merge_keys=["order_id"])
- YAML
- Python
mode: upsert
merge_strategy: delta_by_partition
merge_keys: [order_id]
merge_partition_column: order_date
result = ingest(
...,
mode="upsert",
merge_strategy="delta_by_partition",
merge_keys=["order_id"],
merge_partition_column="order_date",
)
- YAML
- Python
mode: upsert
merge_strategy: replace_partitions
merge_partition_column: order_date
replace_partitions_source_complete: true
result = ingest(
...,
mode="upsert",
merge_strategy="replace_partitions",
merge_partition_column="order_date",
replace_partitions_source_complete=True,
)
Partition replacement requires evidence
Use replace_partitions only when the source is known to be complete for every partition in the batch. If that cannot be proven, use a merge strategy instead.
ContractForge fails when the affected partition list would exceed the configured predicate limit. Truncating that list would make a destructive replacement ambiguous.