Built-in rules

RuleScopeQuarantine
required_columnsSchema-level presence check.No, abort-only.
not_nullRow-level null check.Yes.
unique_keyDataset-level uniqueness check.No, abort-only.
accepted_valuesRow-level domain check.Yes.
min_rowsDataset-level minimum volume check.No, abort-only.
max_null_ratioColumn-level aggregate check.No, abort-only.
expressionsSQL expression rule.Depends on severity.
customRegistered custom rule.Rule-defined.

Example

quality_rules:
  required_columns: [order_id, customer_id, updated_at]
  not_null: [order_id, updated_at]
  unique_key: [order_id]
  min_rows: 1
  accepted_values:
    status: [open, closed, cancelled]
  max_null_ratio:
    discount_amount: 0.20
  expressions:
    - name: non_negative_total
      expression: "order_total >= 0"
      severity: quarantine
      message: "Order total must not be negative."

on_quality_fail: quarantine

Failure actions

fail

Abort the run when a rule fails. Use for curated tables and strict contracts.

warn

Record the failure but continue writing. Use only when the business accepts weak gates.

quarantine

Write invalid rows to quarantine and continue with valid rows when rules can isolate rows.

Abort-only escalation

Dataset-level failures such as missing columns, duplicate keys or too few rows fail even when quarantine is requested.

Performance model

ContractForge consolidates quality aggregates to avoid one Spark action per rule. Quarantine row extraction still requires an action because rejected rows must be materialized.

Custom rules

Use the custom quality registry when a rule is reusable across projects. Keep custom rules deterministic and return explicit metadata; avoid hidden writes or external side effects.