Skip to main content

Quality gates.

Quality rules decide whether a batch can be written. They are pipeline safety gates, not a replacement for a full data quality platform.

Built-in rules

RuleScopeQuarantine
required_columnsSchema-level presence check.No, abort-only.
not_nullRow-level null check.Yes.
unique_keyDataset-level uniqueness check.No, abort-only.
accepted_valuesRow-level domain check.Yes.
min_rowsDataset-level minimum volume check.No, abort-only.
max_null_ratioColumn-level aggregate check.No, abort-only.
expressionsSQL expression rule.Depends on severity.
customRegistered custom rule.Rule-defined.

Example

quality_rules:
required_columns: [order_id, customer_id, updated_at]
not_null: [order_id, updated_at]
unique_key: [order_id]
min_rows: 1
accepted_values:
status: [open, closed, cancelled]
max_null_ratio:
discount_amount: 0.20
expressions:
- name: non_negative_total
expression: "order_total >= 0"
severity: quarantine
message: "Order total must not be negative."

on_quality_fail: quarantine

Failure actions

fail

Abort the run when a rule fails. Use for curated tables and strict contracts.

warn

Record the failure but continue writing. Use only when the business accepts weak gates.

quarantine

Write invalid rows to quarantine and continue with valid rows when rules can isolate rows.

Abort-only escalation

Dataset-level failures such as missing columns, duplicate keys or too few rows fail even when quarantine is requested.

Performance model

ContractForge consolidates quality aggregates to avoid one Spark action per rule. Quarantine row extraction still requires an action because rejected rows must be materialized.

Custom rules

Use the custom quality registry when a rule is reusable across projects. Keep custom rules deterministic and return explicit metadata; avoid hidden writes or external side effects.