Contextual Flux
Architecture v2
Architecture for context-oriented semantic resolution, governed planning, validated execution, explicit environmental state management, and adaptive evolution based on invariants and auditable observability.
1. Introduction
The Contextual Flux Architecture (CFA) is an architecture for AI-oriented systems that replaces the traditional paradigm of Agents + static Tools/Skills with a model in which intent resolution is treated as a first-class architectural entity.
CFA treats execution not as an immediate act, but as a grounded decision conditioned by understood intent, governance policies, cost constraints (FinOps), data contracts, and the current state of the environment.
CFA is an architecture of context-oriented semantic resolution, governed planning, validated execution, explicit environmental state management, and adaptive evolution based on invariants and auditable observability.
Formal Definition of Intention
The term intent is used with technical precision in CFA, not as a synonym for instruction or command:
The fundamental CFA flow is:
2. The Problem CFA Solves
Traditional agent architectures built around static tools present four structural limitations that CFA was designed to solve:
Dependence on embedding matching. Failure under complex compositions. Apparent understanding without real understanding.
Uncontrolled growth of tools. Difficult versioning. Low adaptability to novel scenarios.
Execution before deep validation. Risk of excessive cost, PII violations, and broken data contracts.
Systems do not know the state of the environment after execution. Structured auditability is missing. Context is lost between intents.
3. Foundational Principles
Every execution begins with the formalization of intent. The system never starts from code; it starts from a structured understanding of what must happen.
The intent is converted into a structured, typed signature before any planning occurs. Raw text is not a valid planner input.
No plan is executed without validation against governance policies, FinOps constraints, and data contracts. The cost of blocking early is always lower than the cost of reversing later.
Initial executions are ephemeral (JIT). Only executions that prove repeatable value and compliance are promoted to persistent artifacts.
The system maintains and consults a continuous model of data state. No intent is processed without that context.
Each decision is recorded as a typed, auditable event. Governance without explainability is not real governance.
4. General Architecture
Natural language. System entry point.
Converts natural language into a formal structure. Consults the Context Registry before generating the signature.
Confidence score, ambiguity, and confirmation modes.
auto / soft / hard / human_escalation. PII + gold_write + high cost -> escalation. Timeout -> block + notify.
Typed signature plus execution_context. Immutable formal contract of the intent.
Applies governance, FinOps, and contract rules. Max 3 replans. approve / replan / block.
Estimates cost by signature_hash (30 days). Forces replan if above the ceiling. Automatic feedback loop.
Generates a governed DAG plus idempotence. Supports Composite Intent.
Analyzes code before execution. Detects forbidden tokens and contract violations.
Acquires an exclusive lock over the target_scope. Holds it until after State Projection. Conflict -> queue or reject.
Revalidates version_id, policy_bundle_version, and catalog_snapshot_version. Closes TOCTOU. Mismatch -> replan.
Isolated and monitored execution. Immediate interruption on a forbidden operation or Environmental Fault.
Validates cardinality, real cost, final schema, and null ratio.
Sequence: retry -> partial_failure_policy -> project_state. Granular by consistency_unit.
Updates the Context Registry. Atomic, versioned. Lock released after confirmed projection.
Current state of the environment. Snapshot by latest_committed_before_intent_resolution.
veto_absolute -> ordered_authority. Conflict -> strictest_outcome.
IFo + IFs + IFg + IDI by intent_signature_hash. Window by domain. generation_metadata.
append_only, causal order. I8 — verifiable reproducibility.
5.1 Intent Normalizer + Semantic Resolution
The Intent Normalizer is the most critical component in the pipeline. An incorrect signature generated here contaminates every downstream component — the Policy Engine will govern the wrong problem, the Planner will build the wrong DAG, and the generated code will be deterministically incorrect without any apparent hallucination.
Component Input
The Normalizer must consult the Context Registry before generating the signature. This is an architectural input, not an optional one:
intent_normalizer:
inputs:
- user_intent # linguagem natural
- context_registry.environment_state # current state of the environment
- data_catalog # metadata and classifications
outputs:
- semantic_resolution
Semantic Resolution
semantic_resolution:
signature: {...} # generated State Signature
confidence_score: 0.82 # [0.0 - 1.0]
ambiguity_level: medium # low | medium | high
competing_interpretations:
- join_nfe_clientes_silver
- enrich_nfe_with_master_data
confirmation_mode: hard_required # auto | soft | hard_required
environment_constraints_injected: # constraints injected from the Context Registry
- silver_documentos.state = partially_committed
- silver_documentos.publish_allowed = false
Confirmation Modes
| Mode | Criterion | Behavior |
|---|---|---|
auto | High confidence, low risk, no PII | Proceeds without interruption |
soft | Medium confidence, read intent | Displays the signature and proceeds if there is no conflict |
hard_required | PII detected, write to Silver/Gold, high estimated cost, relevant semantic ambiguity, multiple compatible datasets | Explicit confirmation required before proceeding |
Integration with the Context Registry
If the Context Registry indicates silver_documentos.state = partially_committed, the Normalizer must adjust the interpretation of the intent, inject the constraint into the plan, or block intents that depend on publication of the affected dataset. Without that consultation, the system proposes plans over invalid assumptions about the state of the world.
5.1b Confirmation Orchestrator
The Intent Normalizer is the most dangerous single point of failure in the system — a wrong signature contaminates the entire pipeline with deterministic perfection. The Confirmation Orchestrator is the structural mitigation: it inserts an escalation layer between semantic resolution and the Policy Engine, activated selectively by risk.
It does not add friction in 90% of cases. In the 10% where risk is real — PII, ambiguity, writes to critical layers — it is mandatory.
confirmation_orchestrator:
modes:
- auto # high confidence, low risk — proceeds without interruption
- soft # displays the signature and proceeds without conflict
- hard # explicit confirmation required
- human_escalation # escalation to human review
escalation_triggers:
- confidence_score < 0.65 AND contains_pii == true
- competing_interpretations > 1
- silver_write AND context_registry.publish_allowed == false
- gold_write # always escalated
- estimated_cost > cost_ceiling_dbu * 0.8 # pre-cost above 80% of the ceiling
human_timeout_seconds: 300
fallback_on_timeout:
action: block
reason: human_confirmation_timeout
notify: governance_team
audit_event: HUMAN_ESCALATION_TIMEOUT
human_escalation mode is not a weakness of the system: it is explicit recognition of its limits.
5.2 State Signature
Typed and enriched representation of the intent. It is the formal contract that governs the entire remainder of the pipeline. Once generated and confirmed, the Signature is immutable — any replanning generates a new Signature.
signature:
domain: fiscal_data_processing
intent: reconciliation_and_persist
source_type: nested_json
target_layer: bronze_to_silver
datasets:
- name: nfe
size: 4TB
classification: high_volume
- name: clientes
size: 500MB
classification: sensitive
pii: [cpf, email]
constraints:
no_pii_raw: true
partition_by: [processing_date]
enforce_types: true
merge_key_required: true
execution_context: # normative context of the execution
policy_bundle_version: "v4.2" # policies in force at that moment
catalog_snapshot_version: "catalog_2026_03_22" # state of the catalog
Why execution_context belongs to the Signature
The execution_context is not audit metadata — it is part of the execution contract. With it embedded in the Signature, the system can reproduce any historical execution with the policies and catalog that were in force at that moment, not merely explain what happened. Without it, the Audit Trail records the past but does not make it reproducible. The three elements — formalized intent, state of the environment, and normative context — form the complete contract of an execution.
5.3 Policy Engine
Applies all governance, safety, and FinOps rules before execution. It is the layer that guarantees that the system never executes what it should not.
| Policy Type | Detects | Possible Action |
|---|---|---|
| Privacy Wall | Untreated PII usage, joins with sensitive data | Requires anonymization or blocks |
| FinOps Guard | Full scans on large volumes, joins without a temporal filter | Requires a filter or blocks |
| Contract Enforcement | Append to Silver without a merge key, write outside the proper layer | Converts the operation or blocks |
| Execution Safety | Forbidden imports in the Signature, declared dangerous code patterns | Blocks immediately |
Possible Policy Engine outcomes: approve, replan (with declared mandatory interventions), or block.
Declarative Rule Contract
Policy Engine rules are expressed declaratively — condition plus action — and versioned with the policy_bundle_version. The whitepaper does not specify the implementation DSL, but it defines the minimum contract that any rule must satisfy:
policy_rule:
name: forbid_raw_pii_in_silver
condition: target_layer == "silver" AND contains_pii_raw == true
action: block # approve | replan | block
severity: critical
fault_code: GOVERNANCE_RAW_PII_IN_SILVER
Replanning Limit
The replanning cycle cannot be unlimited. A loop of replans is as problematic in production as a stuck execution, and it has no terminal state without this declared limit:
replan_policy:
max_attempts: 3
on_max_exceeded:
action: block
decision_state: blocked
reason: max_replan_attempts_exceeded
audit_required: true
5.4 Composite Intent Model
Composite intents are decomposed into sub-intents with explicit dependencies. In the conceptual model this is represented as an Intent Graph; in implementation, this capability may be absorbed internally by the Execution Planner.
intent_graph:
root_intent: fiscal_reconciliation_publish
sub_intents:
- id: ingest_nfe
- id: anonymize_client_master
- id: reconcile_documents
- id: aggregate_by_state
- id: publish_audit_view
dependencies:
- ingest_nfe -> reconcile_documents
- anonymize_client_master -> reconcile_documents
- reconcile_documents -> aggregate_by_state
- aggregate_by_state -> publish_audit_view
shared_context:
consistency_unit: partition
key: processing_date
5.5 Execution Planner
Generates the execution DAG from the approved Signature. The Planner is not free — it fills templates, follows the plan approved by the Policy Engine, and respects every constraint declared in the Signature.
execution_plan:
- step: extract
source: nfe
filter: processing_date >= '2026-01-01' # required by FinOps
- step: anonymize
source: clientes
transform:
- sha256(cpf) as cpf_hash
- drop(email)
- step: join
type: broadcast
condition: nfe.cpf = clientes.cpf_hash
- step: load
type: merge
target: silver_documentos
key: nfe_id # required by Contract
Pre-cost Estimator
The FinOps Guard blocks excessive cost after the Policy Engine evaluates the Signature. The Pre-cost Estimator anticipates that evaluation — it estimates cost before the Signature is approved, using execution history for the same intent_signature_hash. This closes the gap between "estimated cost" and "real cost" with an automatic feedback loop.
pre_cost_estimator:
inputs:
- signature
- context_registry.environment_state
- execution_history_by_signature_hash # last 30 days
outputs:
estimated_dbu: float
estimated_shuffle_gb: float
cost_risk_score: float # [0.0 - 1.0]
confidence: float # based on N historical executions
threshold_policy:
- if estimated_dbu > cost_ceiling_dbu:
action: force_replan_with_filter
reason: PRE_COST_CEILING_EXCEEDED
- if estimated_dbu > cost_ceiling_dbu * 0.8:
action: escalate_to_confirmation_orchestrator
mode: human_escalation
feedback_loop:
update_on: post_execution
key: intent_signature_hash
window: last_30_days
model: rolling_weighted_average
Every execution generated by the Planner must be idempotent. On retry or re-execution after partial failure, the system must not duplicate data, create duplicate records, or produce a result different from what it would have produced on the first successful execution.
execution_semantics:
idempotency:
required: true
strategy:
- merge_with_deterministic_key # merge usa merge_key da Signature
- partition_overwrite_safe # overwriting an entire partition is idempotent
forbidden:
- append_without_dedup # append puro gera duplicatas em retry
Consistency Unit — Enum Fechado
The consistency_unit type is not free-form — it is selected by rule based on the execution context:
consistency_unit:
allowed_types:
- partition # standard for partitioned batch
- dataset # for atomic cross-dataset operations
- dag_branch # for composite intents with independent branches
- time_window # for micro-batch with a time window
selection_rules:
- if execution_mode == batch: partition
- if cross_dataset_atomic == true: dataset
- if composite_intent == true: dag_branch
- if execution_mode == micro_batch: time_window
5.6 Static Validation
Analysis of the generated code before execution. Detects violations that can be identified without running the job. Belongs to the Static Safety Faults family.
static_analysis:
forbidden_tokens:
- collect()
- toPandas()
- crossJoin()
- cpf # PII raw
- email # PII raw
required_patterns:
- filter( # mandatory temporal predicate
- merge( # append direto proibido em Silver
schema_contract:
expected_columns: [nfe_id, cpf_hash, processing_date]
forbidden_columns: [cpf, email]
5.7 Runtime Validation
Validates the actual runtime behavior of the execution in the Sandbox. Detects violations that are only visible with real data. Belongs to the Runtime Behavioral Faults family.
runtime_metrics:
max_rows_output: 10_000_000
max_shuffle_size_mb: 500
max_null_ratio: 0.05 # 5%
expected_output_range:
min: 1_000
max: 10_000_000
schema_drift_detection: true
cost_ceiling_dbu: 50
5.8 Partial Execution State
In real data systems, total and clean failure is rare. The common case is partial failure: 80% of partitions processed, one branch of the DAG materialized, three out of four datasets conformant. CFA has explicit semantics for this.
Consistency Unit
The system must declare the minimum granularity of governance. Without that, "partial" is a vague word.
partial_execution_state:
overall_state: partially_committed
consistency_unit:
type: partition # partition | dataset | dag_branch | time_window
key: processing_date
successful_units:
- dataset: nfe
partitions: ["2026-01-01", "2026-01-02"]
- dataset: clientes_anon
partitions: ["2026-01-01", "2026-01-02"]
failed_units:
- dataset: nfe
partitions: ["2026-01-03"]
error_code: RUNTIME_SHUFFLE_LIMIT_EXCEEDED
fault_family: runtime_behavioral
consistency_policy:
mode: selective_quarantine
rollback_required: false
publish_allowed: false
Partial Failure Policy
The internal sequence of Partial Execution State is deterministic and cannot be reordered:
partial_execution_flow:
sequence:
- retry_failed_units # attempts recovery of failed units before any decision
- apply_partial_failure_policy # determina modo (rollback / quarantine / commit / degraded)
- project_state # projects into the Context Registry only after policy is applied
partial_failure_policy:
mode: selective_quarantine
allowed_partial_commit: true
publish_on_partial_success: false
rollback_scope: failed_consistency_unit_only
promotion_eligible: false
| Mode | Behavior | Quando usar |
|---|---|---|
full_rollback | Any failure invalidates everything | Atomic pipelines, critical data |
selective_quarantine | Isolates failed units, preserves valid ones | Partitioned batch processing |
partial_commit_no_publish | Writes partially, does not publish the layer | Tolerant ingestion pipelines |
degraded_publish | Allows restricted use with degraded status | Cases with declared relaxed SLA |
Retry Policy
Failures of individual consistency units are eligible for automatic retry within the same execution. Retry operates on the failed_units, not on the entire execution — preserving what has already been successfully committed.
retry_policy:
enabled: true
max_attempts: 3
retry_scope: failed_consistency_units_only # nunca re-executa unidades bem-sucedidas
backoff:
strategy: exponential
base_seconds: 30
on_max_exceeded:
action: quarantine_failed_units
state: quarantined
Publish Semantics
The publish_allowed flag has precise semantics. "Publishing" is not synonymous with "writing" — it means making the dataset available for downstream consumption as trusted data.
publish_semantics:
definition: dataset_visible_and_trusted_for_downstream_consumption
states:
- committed_not_published # data written, not available downstream
- published # data available and trusted
- degraded # available with declared restrictions (degraded_publish)
write_vs_publish:
write: data written to storage
publish: data declared trustworthy for consumption
distinction: it is possible to have write without publish — never publish without write
Panic Rollback — Environment Change During Execution
Partial Execution State handles failure of the execution. There is a distinct case: a change in the external environment during execution — a cluster losing nodes, IAM permission revoked, storage quota exceeded. These events are not execution errors — they are invalidations of the context in which execution was authorized. They require their own protocol.
panic_rollback:
triggers:
- external_permission_revoked # IAM, ACL, credencial expirada em runtime
- compute_resource_unavailable # cluster loses nodes beyond the threshold
- storage_quota_exceeded # storage unavailable in the middle of the write
- policy_bundle_changed_mid_exec # policy changed while the job was running
action: immediate_interrupt_and_isolate
scope: all_in_flight_consistency_units
post_panic:
state: quarantined
projection_allowed: false # Context Registry is not updated
audit_event: PANIC_ROLLBACK_TRIGGERED
reason_required: true
distinction_from_partial_execution_state:
"Partial Execution State: execution failed for an internal reason"
"Panic Rollback: the external environment invalidated the premises of the execution"
5.9 Context Registry
The Context Registry is the cognitive infrastructure of the system. It is not an execution log — it is a living model of the state of the environment. It represents not only "what was done" but also "what state what was done is now in".
It is consulted by the Intent Normalizer before each new intent, and updated by the State Projection Protocol after each execution.
context_registry:
environment_state:
datasets:
- name: silver_documentos
state: partially_committed
last_successful_partition: "2026-01-02"
pending_partitions: ["2026-01-03"]
publish_allowed: false
quarantined_units: ["2026-01-03"]
- name: clientes_anon
state: committed
publish_allowed: true
execution_history:
- intent_id: fiscal_reconciliation_001
outcome: partially_committed
consistency_unit: partition
timestamp: "2026-03-22T14:30:00Z"
policy_bundle_version: "v4.2"
Registry Versioning
The Context Registry requires explicit versioning. Without it, there is no temporal consistency — and the reproducibility guaranteed by the execution_context in the Signature becomes illusory if the state of the environment at execution time cannot be recovered.
context_registry:
versioning:
model: snapshot
version_id: uuid # generated at each atomic write
timestamp: iso8601
read_mode:
type: snapshot_consistent # read always from a stable snapshot
write_mode:
type: atomic_commit # atomic write through State Projection
Pre-execution Revalidation (TOCTOU)
There is a window between the query to the Context Registry by the Intent Normalizer (step 2) and the actual execution (step 9). During that interval, another execution may project new state onto the same dataset. Without revalidation, the system executes on a stale premise — silently violating Invariant 3.
pre_execution_revalidation:
trigger: immediately_before_execution # after the lock is acquired, before the Sandbox
checks:
- context_registry.version_id # is the consulted snapshot still the latest_committed?
- policy_bundle_version # has policy not changed since the Signature?
- catalog_snapshot_version # has the catalog not changed since the Signature?
on_mismatch:
action: replan_or_abort
reason: stale_environment_state
counts_as_replan_attempt: true # counts toward the Policy Engine max_replan_attempts
TTL and Obsolete State Policy
The Context Registry may accumulate obsolete state indefinitely. A dataset marked as partially_committed for weeks, with no new execution over it, continues to block publication without ever being resolved — silently poisoning future intents. An explicit expiration and stale-state transition policy is required.
ttl_policy:
partially_committed:
ttl_days: 7 # after 7 days without resolution
on_expiry:
action: transition_to_stale
notify: true # gera evento no Audit Trail
quarantined:
ttl_days: 30 # long quarantine — requires manual review
on_expiry:
action: escalate_to_manual_review
block_dependent_intents: true
stale:
definition: state not updated beyond the TTL without declared resolution
behavior:
new_intent_on_stale_dataset: hard_confirmation_required
publish_blocked: true
audit_event: CONTEXT_STATE_STALE
5.10 State Projection Protocol
The bridge between execution and context. It guarantees that the result of each execution becomes system knowledge before any new intent is processed.
state_projection:
trigger: post_runtime_validation # occurs after validation, before the next intent
inputs:
- partial_execution_state
- execution_metadata
scope:
- dataset_state
- partition_availability
- publish_allowed
- quarantined_units
- pending_partitions
rules:
- if overall_state == "partially_committed":
dataset.state = "partially_committed"
publish_allowed = false
- if failed_units exist:
dataset.pending_units = failed_units
- if policy.mode == "selective_quarantine":
dataset.quarantined_units = failed_units
atomicity: required
on_projection_failure:
action: block_next_intent
reason: environment_state_uncertain
Mandatory Properties
5.11 Evaluation Indices
Three distinct indices capture different dimensions of execution quality. Each is independent and measurable.
- normalized_latency ∈ [0,1] — compared to the historical baseline by workload type
- normalized_cost ∈ [0,1] — DBUs, scanned GB, consumed shuffle
- execution_stability ∈ [0,1] — success rate, retries, absence of critical failures
- output_contract_adherence — delivered schema versus expected schema, types, required columns
- semantic_drift_absence — expected cardinality, consistent distribution
- domain_invariant_preservation — key uniqueness, business rules preserved
- policy_compliance — no active policy violation detected post-execution
- absence_of_pii_exposure — PII handled according to rule throughout the output
- layer_adherence — writes respect Bronze/Silver/Gold as declared
Promotion Criterion
IDI — Intent Drift Index
Fourth lifecycle signal, complementary to the three execution indices. IDI detects when the domain has changed and the skill has gone stale before IFs detects degradation in the result. It measures how often executions of the same hash need to be replanned:
- IDI close to 1.0 → stable intent, domain consistent with the catalog
- IDI below 0.75 → drift detected → skill goes directly to the watchlist
- IDI below 0.50 → severe drift → immediate demotion with no observation window
Configurable Evaluation Window
The promotion evaluation window is not fixed. High-volatility domains such as tax and regulatory need shorter windows. Stable domains such as master data and registration can use longer windows:
evaluation_window:
default: last_7_days
by_domain:
fiscal: last_3_days # high regulatory volatility
master_data: last_30_days # stable domain
operational: last_7_days # default
5.12 Skill Lifecycle Management
A system that only promotes skills eventually creates a new graveyard of artifacts. Without a demotion cycle, CFA reproduces the static-catalog problem with architectural pedigree.
States
The transition persisted_active → demoted may occur directly, bypassing the watchlist, in the event of a critical contract or governance violation.
Promotion Criterion
Promotion requires evidence accumulated within a time window. A single successful execution is not enough for industrialization:
promotion_policy:
min_executions: 3 # minimum number of successful executions
evaluation_window: last_7_days # within this window
thresholds:
IFo: 0.75
IFs: 0.90
IFg: 1.0 # binary — no exception
gate: IFo >= T1 AND IFs >= T2 AND IFg = 1 AND executions >= min_executions
promotion_unit:
type: intent_signature_hash # hash of the State Signature — avoids promoting lookalikes
# two intents with different signatures = different skills, even if semantically close
| Trigger | Description | Resulting State |
|---|---|---|
| Schema drift | Output contract diverges from the declared contract | watchlist → deprecated |
| Policy change | The current policy invalidates the skill | active → demoted |
| Degraded IFs | Recurring drop below T2 | active → watchlist |
| High cost | Consistently low IFo | active → watchlist |
| Low reuse | Skill not used for N periods | watchlist → deprecated |
| Catalog incompatibility | Dataset or domain removed from the catalog | active → retired |
Skill Generation Metadata
Every promoted skill must carry provenance metadata. Without it, it is impossible to identify which skills were industrialized under a system version with a promotion bug, making the "Synthetic Legacy" untraceable and unrecoverable at scale.
skill_generation_metadata:
promoted_at: iso8601
promoted_by_system_version: "cfa_v2.1" # version of CFA that executed the promotion
policy_bundle_at_promotion: "v4.2" # policy active at the time
catalog_snapshot_at_promotion: "catalog_2026_03_22"
promotion_scores:
IFo: 0.88
IFs: 0.94
IFg: 1.0
execution_count_at_promotion: 5
evaluation_window: last_7_days
5.13 Fault Model
Errors in CFA are not exceptions. They are governed events. Each family has a detection point and a defined owner. Mixing families without making the stage explicit creates ambiguity about who detects, when it is detected, and which terminal state is possible.
| Family | Detected In | Owner | Action |
|---|---|---|---|
| Semantic Faults governance, finops, contract |
Policy Engine | Governance / FinOps layer | replan or block before execution |
| Static Safety Faults collect(), forbidden imports, raw PII |
Static Validation | Code analysis layer | immediate block, no execution |
| Runtime Behavioral Faults cardinality drift, excessive shuffle, schema divergence |
Runtime Validation | Sandbox / Observability | quarantine, rollback, or partial state |
| Environmental Faults revoked IAM, cluster loss, storage quota, mid-execution policy change |
Sandbox Monitor (external runtime) | Infrastructure layer | Panic Rollback — quarantine with no projection |
Structure of a Fault Event
error_signature:
code: FINOPS_UNBOUNDED_SCAN
family: semantic_faults
severity: high
stage: policy_engine
detected_before_execution: true
mandatory_action: replan_or_block
remediation:
- "Apply a temporal filter on processing_date"
5.14 Decision Engine
Consolidates the result of all validations and produces the final state of the intent. When components diverge — the Policy Engine approved, Runtime detected drift, Partial Policy allowed commit — the Decision Engine applies the precedence rule to determine who has authority.
Decision Precedence
Invariants and components have different roles in the decision. Invariants are not participants. They are the boundary of the field. If an invariant is violated, the result is determined before any other evaluation.
decision_precedence:
veto_absolute:
- invariants # a violation cancels everything; no vote is possible
# they are not just another participant; they are the edge of the field
ordered_authority:
- policy_engine # 1st: decides pre-execution (approve / replan / block)
- runtime_validation # 2nd: decides over the real result (may escalate severity)
- partial_failure_policy # 3rd: decides commit granularity
approved_with_warnings. Runtime escalated from info to warning, Partial Policy applied selective commit, and no invariant was violated.
Conflict Resolution Between Authorities
When policy_engine, runtime_validation, and partial_failure_policy produce divergent outcomes, the resolution strategy is strictest_outcome: the most restrictive result among the active authorities prevails:
decision_engine:
conflict_resolution:
strategy: strictest_outcome
# if policy_engine = approved and runtime = quarantine → quarantine
# if policy_engine = replanned and partial = partially_committed → replanned
# severity ordering: blocked > quarantined > rolled_back > partially_committed
# > approved_with_warnings > approved
Possible States
decision_state:
- approved # complete, validated, publishable execution
- approved_with_warnings # complete execution, non-blocking alerts
- replanned # Policy Engine required changes; new cycle
- blocked # unrecoverable pre-execution violation
- partially_committed # partial result; Partial Failure Policy applied
- quarantined # isolated units; investigation required
- rolled_back # state reverted due to critical failure
- promotion_candidate # IFo+IFs+IFg above the thresholds
5.15 Audit Trail
Immutable record of all decision events. It is not conventional logging. These are typed events, correlated by intent_id and versioned by policy_bundle. It serves two consumers with distinct needs that become incompatible if not addressed explicitly.
audit_trail:
intent_id: fiscal_reconciliation_001
correlation_id: session_4f9a2c
policy_bundle_version: v4.2
events:
- stage: intent_normalizer
event_type: semantic_resolution
outcome: resolved
confidence: 0.81
confirmation_mode: hard_required
environment_state_consulted: true
- stage: policy_engine
event_type: policy_replan
outcome: replanned
faults: [GOVERNANCE_RAW_PII_JOIN, FINOPS_MISSING_TEMPORAL_PREDICATE]
- stage: static_validation
event_type: code_analysis
outcome: passed
- stage: execution
event_type: partial_commit
outcome: partially_committed
consistency_unit: partition
successful: ["2026-01-01", "2026-01-02"]
failed: ["2026-01-03"]
- stage: state_projection
event_type: environment_update
outcome: projected
- stage: promotion_engine
event_type: promotion_evaluation
outcome: not_eligible
reason: partial_execution
Consumption Modes
Use: debugging, performance tuning, failure analysis, re-execution of failed partitions
Use: compliance, decision traceability, governance evidence
Properties of the Record
Immutable after write. Causally ordered by intent_id, not merely chronologically. Correlated across subintents. Versioned by the policy_bundle active at execution time, enabling historical replay with the original policy.
audit_trail:
immutability: append_only # events are only appended, never modified or removed
ordering: causal_order # not just timestamp order; causal order of events
guarantees:
- no_event_reordering # event A that caused B always appears before B
- immutable_after_write
- complete_per_intent # every intent_id has both start and end recorded
decision_state_enum: # states are enumerated; no ad hoc states
- approved
- approved_with_warnings
- replanned
- blocked
- partially_committed
- quarantined
- rolled_back
- promotion_candidate
6. System Invariants
Invariants are the properties the system must always preserve. They differ from benefits: they do not describe what the system does well, but what it guarantees. An invariant violation is a system failure, not a user failure.
No data classified as PII is published without explicit treatment (hashing or confirmed removal). This applies to all outputs, intermediate and final.
No Silver or Gold layer receives writes without a merge key or a mutability policy defined and approved by the Policy Engine.
No intent is processed without consulting the environment_state from the Context Registry. The state of the environment is a mandatory input to the Intent Normalizer.
Every partial_execution_state must be projected into the Context Registry before the next intent is processed. If projection fails, the next intent is blocked until the environment is consistent.
Every decision generates a structured event in the Audit Trail. Executions without an audit trail are treated as unauthorized executions.
No approved execution contains forbidden operations. If a forbidden operation is detected at runtime (after Static Validation), execution is immediately interrupted and the generated partial state is quarantined before any projection into the Context Registry.
Precedence over I4No dataset in state partially_committed may be published as trusted. The publish_allowed flag is managed by the State Projection Protocol and is only released after complete resolution of the pending units.
Every execution must be reproducible with exactly the same results from the same inputs: user_intent + context_registry.version_id + policy_bundle_version + catalog_snapshot_version. If the Audit Trail and the Signatures execution_context do not allow exact re-execution, the execution violates I8 and the skill is permanently blocked from promotion.
Precedence Between Invariants
Without this declared precedence rule, an implementer could interpret I4 (mandatory projection) as more important than I6 (immediate interruption), projecting contaminated state into the Context Registry, which is exactly the scenario that destroys system integrity.
7. End-to-End Flow
| # | Step | Component | Invariant / Guarantee |
|---|---|---|---|
| 1 | Intent received | — | — |
| 2 | Consult environment_state (versioned snapshot) | Context Registry | I3 — latest_committed snapshot |
| 3 | Lock check — queue or reject by conflict_resolution | Concurrency Control | single_active_intent_per_scope |
| 4 | Normalization + semantic resolution | Intent Normalizer | — |
| 5 | Semantic escalation (auto / soft / hard / human) | Confirmation Orchestrator | fallback: block + notify on timeout |
| 6 | Generate State Signature + execution_context | State Signature | policy_bundle + catalog versioned |
| 7 | Apply policies (max 3 replans) | Policy Engine | I1, I2 |
| 8 | Pre-execution cost estimation | Pre-cost Estimator | feedback loop by signature_hash |
| 9 | Planning (DAG / Composite + idempotence) | Execution Planner | mandatory merge_key |
| 10 | Static code validation | Static Validation | I6 |
| 11 | Acquire Execution Lock (target_scope) | Concurrency Control | exclusivity before revalidation |
| 12 | Pre-execution revalidation (TOCTOU) | Pre-execution Revalidation | version_id + policy + catalog |
| 13 | Execution in the Sandbox | Execution | I6, idempotence, Panic Rollback if env changes |
| 14 | Runtime behavior validation | Runtime Validation | — |
| 15 | retry → partial_failure_policy → project_state | Partial Execution State | I7, consistency_unit |
| 16 | Projection + release lock (atomic, versioned) | State Projection Protocol | I4, I6 (precedence) |
| 17 | Final decision — strictest_outcome among authorities | Decision Engine | decision_state_enum |
| 18 | Lifecycle evaluation — IFo + IFs + IFg + IDI by hash | Promotion / Demotion Engine | I8 — reproducibility |
| 19 | Record in the Audit Trail (append_only, causal order) | Audit Trail | I5 |
Step 5 (Confirmation Orchestrator) is the point where human ambiguity is explicitly addressed before any planning. The lock (step 11) is acquired before revalidation (step 12), closing the residual race condition. IDI in step 18 detects domain drift before IFs detects result degradation.
8. Benefits
9. Extension Points
A closed system does not become a standard; it becomes one company's product. For CFA to be adoptable as a framework, it needs declared extension points where specific implementations can inject domain behavior without modifying the core.
Extension points are interface contracts, not implementations. Each point defines what CFA expects to receive and what it guarantees to return. What happens inside is the responsibility of the implementation.
extension_points:
intent_normalizer:
- custom_domain_resolver # domain-specific interpretation logic
- catalog_enrichment_hook # external catalog enrichment before the Signature
- confidence_scorer_override # custom semantic-confidence model
policy_engine:
- custom_rule_provider # external declarative rules (regulatory, domain)
- external_compliance_validator # validation against non-AI legacy systems
execution_planner:
- custom_template_library # domain-specific code templates (Spark, SQL, dbt)
- pre_cost_estimator_backend # cost-estimation backend (AQE, historical)
context_registry:
- storage_backend # Delta, DynamoDB, Redis, etc.
- snapshot_provider # snapshot_consistent implementation
promotion_engine:
- custom_promotion_signal # external business-value signal
- domain_window_provider # evaluation_window by domain
audit_trail:
- storage_backend # S3, Kafka, OpenLineage, etc.
- regulatory_formatter # audit format by jurisdiction
10. Execution Scope (v2)
CFA v2 was designed and validated for a specific execution model. Declaring it explicitly protects the document against improper extrapolation and defines where the architectural guarantees are valid.
execution_model:
version: v2
supported_modes:
- batch
- micro_batch
concurrent_intent_support: false # one active intent per target scope
streaming_support: planned_for_v3
consistency_assumption: single_active_intent_per_target_scope
Why batch and micro-batch — rather than streaming
Three CFA v2 components implicitly assume discrete execution boundaries, which exist in batch and micro-batch but are problematic in continuous streaming:
| Component | Assumption in v2 | Problem in Streaming |
|---|---|---|
| Partial Execution State | Discrete consistency unit (partition, time window) | Streaming requires offset, epoch, or checkpoint semantics, not partitions |
| State Projection Protocol | Post-validation trigger with a clear boundary between execution and projection | In asynchronous streaming, the execution → projection linearity can break |
| Context Registry | Environmental state with enough coherence to guide the next intent | Under real concurrency, it requires locking, optimistic versioning, and causal ordering |
Concurrency Control
The declaration concurrent_intent_support: false is not sufficient on its own. It is necessary to define what happens when two intents arrive simultaneously over the same scope. Without that definition, race conditions are possible and Invariant 3 can be silently violated.
concurrency_control:
model: single_active_intent_per_target_scope
scope_definition:
- dataset
- dataset_partition
lock:
acquisition: before_pre_execution_revalidation # step 10 in the flow
release: after_state_projection_confirmed # step 15 in the flow
conflict_resolution:
on_conflict: queue # queue | reject
# queue: second intent waits for the scope to be released
# reject: second intent receives blocked immediately
locking:
type: optimistic
validation: pre_execution_revalidation
The queue mode is the recommended default for batch because it avoids losing work. The reject mode is appropriate for scenarios where latency matters more than execution guarantee.
Snapshot Selection
The ambiguity over which snapshot the Context Registry serves on each read is resolved by a single rule:
context_registry:
snapshot_selection:
strategy: latest_committed_before_intent_resolution
# always the most recent snapshot that has already been atomically committed
# a snapshot in progress (write underway) is never served
Scope Roadmap
| Version | Added Scope | Impacted Components |
|---|---|---|
| v2 (current) | Batch and micro-batch, sequential intent per scope | — |
| v3 (planned) | Continuous streaming, concurrent intents | Partial Execution State, State Projection, Context Registry |
11. Limitations
| Limitation | Impact | Mitigation |
|---|---|---|
| Implementation complexity | High initial cost. Multiple interdependent components. | Phased implementation (see strategy below) |
| Dependence on a rich catalog | The Intent Normalizer is only as good as the catalog it consults. | Invest in a data catalog before CFA |
| Continuous threshold tuning | T1 and T2 for IFo and IFs require domain calibration. | Historical baselines + incremental adjustment |
| Policy complexity | The Policy Engine requires rule maintenance over time. | Version the policy_bundle in the Audit Trail |
| Additional latency | The validation pipeline adds pre-execution latency. | Auto mode for low-risk intents |
Phased Implementation Strategy
| Phase | Scope | Result |
|---|---|---|
| Phase 1 | Kernel without execution: Intent Normalizer + State Signature + Policy Engine | Decision system. Evaluates intents; does not execute. |
| Phase 2 | Controlled code generation + Static Validation | Deterministic execution over an approved plan. |
| Phase 3 | Sandbox + Runtime Validation + Partial Execution State | Partial-failure tolerance and real metrics. |
| Phase 4 | Context Registry + State Projection + Audit Trail | Persistent state and complete observability. |
| Phase 5 | Promotion / Demotion Engine + IFo/IFs/IFg | Adaptive industrialization with lifecycle. |
12. Non-Goals (Negative Scope)
A serious technical whitepaper states what the system does not aim to do. Non-goals protect against improper extrapolation and define where the system ends, which is as important as defining where it begins.
non_goals:
# Execution scope
- real_time_stream_processing # planned for v3
- distributed_multi_writer_concurrency # requires a Context Registry redesign
- sub_second_latency_execution # validation overhead is intentional
# System responsibility
- data_quality_enforcement # CFA governs execution, not the intrinsic quality of data
- schema_inference # schema comes from the catalog; it is not inferred
- business_rule_definition # CFA applies rules; it does not define them
# Implementation
- storage_engine # engine-agnostic (Spark, Databricks, etc.)
- catalog_implementation # depends on an external catalog; does not implement it
- policy_dsl_specification # rule contract is defined; DSL is an implementation choice
13. Conclusion
CFA v2 represents a paradigm shift in the relationship between AI and critical data systems:
What separates CFA v2 from an elegant concept is the presence of the elements that make real systems operable in production:
Formalization of what is uncertain: semantic resolution with confidence scores and confirmation modes treats ambiguity as system data, not as user failure. Tolerance for what partially fails: Partial Execution State with consistency_unit transforms partial failure from an unrepresentable event into a governed state with explicit policy. Memory of the state of the world: the Context Registry with environment_state closes the loop between what was executed and what may be executed next. Formal guarantees: the seven invariants with precedence rules make the system definable, testable, and auditable.
CFA v2 is a deterministic architecture for governed execution that guarantees semantic consistency, state integrity, and complete auditability through typed intent resolution, explicit concurrency control, pre-execution validation, idempotent execution, versioned state projection, and invariant-based enforcement.
This document is the product of a collaboration among AI architects. Each revision incorporated real criticism about executability, not only conceptual elegance. The natural next step is the technical blueprint: Python classes, interfaces, and mapping of each component to implementable code.