Quality gates.
Quality rules decide whether a batch can be written. They are pipeline safety gates, not a replacement for a full data quality platform.
Built-in rules
| Rule | Scope | Quarantine |
|---|---|---|
required_columns | Schema-level presence check. | No, abort-only. |
not_null | Row-level null check. | Yes. |
unique_key | Dataset-level uniqueness check. | No, abort-only. |
accepted_values | Row-level domain check. | Yes. |
min_rows | Dataset-level minimum volume check. | No, abort-only. |
max_null_ratio | Column-level aggregate check. | No, abort-only. |
expressions | SQL expression rule. | Depends on severity. |
custom | Registered custom rule. | Rule-defined. |
Example
- YAML
- Python
quality_rules:
required_columns: [order_id, customer_id, updated_at]
not_null: [order_id, updated_at]
unique_key: [order_id]
min_rows: 1
accepted_values:
status: [open, closed, cancelled]
max_null_ratio:
discount_amount: 0.20
expressions:
- name: non_negative_total
expression: "order_total >= 0"
severity: quarantine
message: "Order total must not be negative."
on_quality_fail: quarantine
result = ingest(
...,
quality_rules={
"required_columns": ["order_id", "customer_id", "updated_at"],
"not_null": ["order_id", "updated_at"],
"unique_key": ["order_id"],
"min_rows": 1,
"accepted_values": {"status": ["open", "closed", "cancelled"]},
"max_null_ratio": {"discount_amount": 0.20},
"expressions": [
{
"name": "non_negative_total",
"expression": "order_total >= 0",
"severity": "quarantine",
"message": "Order total must not be negative.",
}
],
},
on_quality_fail="quarantine",
)
Failure actions
fail
Abort the run when a rule fails. Use for curated tables and strict contracts.
warn
Record the failure but continue writing. Use only when the business accepts weak gates.
quarantine
Write invalid rows to quarantine and continue with valid rows when rules can isolate rows.
Abort-only escalation
Dataset-level failures such as missing columns, duplicate keys or too few rows fail even when quarantine is requested.
Performance model
ContractForge consolidates quality aggregates to avoid one Spark action per rule. Quarantine row extraction still requires an action because rejected rows must be materialized.
Custom rules
Use the custom quality registry when a rule is reusable across projects. Keep custom rules deterministic and return explicit metadata; avoid hidden writes or external side effects.