fail
Abort the run when a rule fails. Use for curated tables and strict contracts.
Core behavior
Quality rules decide whether a batch can be written. They are pipeline safety gates, not a replacement for a full data quality platform.
| Rule | Scope | Quarantine |
|---|---|---|
required_columns | Schema-level presence check. | No, abort-only. |
not_null | Row-level null check. | Yes. |
unique_key | Dataset-level uniqueness check. | No, abort-only. |
accepted_values | Row-level domain check. | Yes. |
min_rows | Dataset-level minimum volume check. | No, abort-only. |
max_null_ratio | Column-level aggregate check. | No, abort-only. |
expressions | SQL expression rule. | Depends on severity. |
custom | Registered custom rule. | Rule-defined. |
quality_rules:
required_columns: [order_id, customer_id, updated_at]
not_null: [order_id, updated_at]
unique_key: [order_id]
min_rows: 1
accepted_values:
status: [open, closed, cancelled]
max_null_ratio:
discount_amount: 0.20
expressions:
- name: non_negative_total
expression: "order_total >= 0"
severity: quarantine
message: "Order total must not be negative."
on_quality_fail: quarantineresult = ingest(
...,
quality_rules={
"required_columns": ["order_id", "customer_id", "updated_at"],
"not_null": ["order_id", "updated_at"],
"unique_key": ["order_id"],
"min_rows": 1,
"accepted_values": {"status": ["open", "closed", "cancelled"]},
"max_null_ratio": {"discount_amount": 0.20},
"expressions": [
{
"name": "non_negative_total",
"expression": "order_total >= 0",
"severity": "quarantine",
"message": "Order total must not be negative.",
}
],
},
on_quality_fail="quarantine",
)failAbort the run when a rule fails. Use for curated tables and strict contracts.
warnRecord the failure but continue writing. Use only when the business accepts weak gates.
quarantineWrite invalid rows to quarantine and continue with valid rows when rules can isolate rows.
Dataset-level failures such as missing columns, duplicate keys or too few rows fail even when quarantine is requested.
ContractForge consolidates quality aggregates to avoid one Spark action per rule. Quarantine row extraction still requires an action because rejected rows must be materialized.
Use the custom quality registry when a rule is reusable across projects. Keep custom rules deterministic and return explicit metadata; avoid hidden writes or external side effects.