Contextual Flux Architecture
CFA v2
Technical Whitepaper

Contextual Flux
Architecture v2

Architecture for context-oriented semantic resolution, governed planning, validated execution, explicit environmental state management, and adaptive evolution based on invariants and auditable observability.

v2.1 Final Technical Whitepaper Architecture Reference Co-authored by AI Architects
Section 01

1. Introduction

The Contextual Flux Architecture (CFA) is an architecture for AI-oriented systems that replaces the traditional paradigm of Agents + static Tools/Skills with a model in which intent resolution is treated as a first-class architectural entity.

CFA treats execution not as an immediate act, but as a grounded decision conditioned by understood intent, governance policies, cost constraints (FinOps), data contracts, and the current state of the environment.

CFA is an architecture of context-oriented semantic resolution, governed planning, validated execution, explicit environmental state management, and adaptive evolution based on invariants and auditable observability.

Formal Definition of Intention

The term intent is used with technical precision in CFA, not as a synonym for instruction or command:

Intent — Formal Definition
An intent is the description of a desired state transformation over datasets, under declared constraints, with a verifiable outcome. It does not specify how; it specifies what, over what, and under which conditions it is valid.

The fundamental CFA flow is:

Core Flow
Semantic Resolution → Governed Planning → Validated Execution → State Management → Adaptive Evolution
Section 02

2. The Problem CFA Solves

Traditional agent architectures built around static tools present four structural limitations that CFA was designed to solve:

P1
Fragile Routing

Dependence on embedding matching. Failure under complex compositions. Apparent understanding without real understanding.

P2
Static Catalog

Uncontrolled growth of tools. Difficult versioning. Low adaptability to novel scenarios.

P3
Missing Enforcement

Execution before deep validation. Risk of excessive cost, PII violations, and broken data contracts.

P4
Implicit State

Systems do not know the state of the environment after execution. Structured auditability is missing. Context is lost between intents.

Root Problem
In traditional architectures, AI operates as a glorified router: it receives natural language, performs embedding matching, and executes a tool. There is no formalization of intent, no real enforcement before execution, and no model of environmental state after execution.
Section 03

3. Foundational Principles

P1
Intent as the Primary Entity

Every execution begins with the formalization of intent. The system never starts from code; it starts from a structured understanding of what must happen.

P2
Mandatory Semantic Typing

The intent is converted into a structured, typed signature before any planning occurs. Raw text is not a valid planner input.

P3
Enforcement before Execution

No plan is executed without validation against governance policies, FinOps constraints, and data contracts. The cost of blocking early is always lower than the cost of reversing later.

P4
Separation between Discovery and Industrialization

Initial executions are ephemeral (JIT). Only executions that prove repeatable value and compliance are promoted to persistent artifacts.

P5
Explicit Environmental State

The system maintains and consults a continuous model of data state. No intent is processed without that context.

P6
Native Observability

Each decision is recorded as a typed, auditable event. Governance without explainability is not real governance.

Section 04

4. General Architecture

Resolution
01User Intent

Natural language. System entry point.

02Intent Normalizer

Converts natural language into a formal structure. Consults the Context Registry before generating the signature.

03Semantic Resolution

Confidence score, ambiguity, and confirmation modes.

04Confirmation Orchestrator

auto / soft / hard / human_escalation. PII + gold_write + high cost -> escalation. Timeout -> block + notify.

05State Signature

Typed signature plus execution_context. Immutable formal contract of the intent.

Governance
06Policy Engine

Applies governance, FinOps, and contract rules. Max 3 replans. approve / replan / block.

07Pre-cost Estimator

Estimates cost by signature_hash (30 days). Forces replan if above the ceiling. Automatic feedback loop.

08Execution Planner

Generates a governed DAG plus idempotence. Supports Composite Intent.

09Static Validation

Analyzes code before execution. Detects forbidden tokens and contract violations.

Execution
10Acquire Execution Lock

Acquires an exclusive lock over the target_scope. Holds it until after State Projection. Conflict -> queue or reject.

11Pre-exec Revalidation

Revalidates version_id, policy_bundle_version, and catalog_snapshot_version. Closes TOCTOU. Mismatch -> replan.

12Execution (Sandbox)

Isolated and monitored execution. Immediate interruption on a forbidden operation or Environmental Fault.

13Runtime Validation

Validates cardinality, real cost, final schema, and null ratio.

14Partial Execution State

Sequence: retry -> partial_failure_policy -> project_state. Granular by consistency_unit.

State and Evolution
15State Projection

Updates the Context Registry. Atomic, versioned. Lock released after confirmed projection.

16Context Registry

Current state of the environment. Snapshot by latest_committed_before_intent_resolution.

17Decision Engine

veto_absolute -> ordered_authority. Conflict -> strictest_outcome.

18Promotion / Demotion

IFo + IFs + IFg + IDI by intent_signature_hash. Window by domain. generation_metadata.

19Audit Trail

append_only, causal order. I8 — verifiable reproducibility.

Section 05 — Components

5.1 Intent Normalizer + Semantic Resolution

The Intent Normalizer is the most critical component in the pipeline. An incorrect signature generated here contaminates every downstream component — the Policy Engine will govern the wrong problem, the Planner will build the wrong DAG, and the generated code will be deterministically incorrect without any apparent hallucination.

Central Risk
The Intent Normalizer performs the system's first controlled act of violence: transforming natural language into an operational contract. An error here is an elegant error — the system proceeds with bureaucratic perfection over a false premise.

Component Input

The Normalizer must consult the Context Registry before generating the signature. This is an architectural input, not an optional one:

YAMLintent_normalizer.inputs
intent_normalizer:
  inputs:
    - user_intent                           # linguagem natural
    - context_registry.environment_state   # current state of the environment
    - data_catalog                          # metadata and classifications
  outputs:
    - semantic_resolution

Semantic Resolution

YAMLsemantic_resolution.output
semantic_resolution:
  signature: {...}                       # generated State Signature
  confidence_score: 0.82                # [0.0 - 1.0]
  ambiguity_level: medium              # low | medium | high
  competing_interpretations:
    - join_nfe_clientes_silver
    - enrich_nfe_with_master_data
  confirmation_mode: hard_required     # auto | soft | hard_required
  environment_constraints_injected:    # constraints injected from the Context Registry
    - silver_documentos.state = partially_committed
    - silver_documentos.publish_allowed = false

Confirmation Modes

ModeCriterionBehavior
autoHigh confidence, low risk, no PIIProceeds without interruption
softMedium confidence, read intentDisplays the signature and proceeds if there is no conflict
hard_requiredPII detected, write to Silver/Gold, high estimated cost, relevant semantic ambiguity, multiple compatible datasetsExplicit confirmation required before proceeding

Integration with the Context Registry

If the Context Registry indicates silver_documentos.state = partially_committed, the Normalizer must adjust the interpretation of the intent, inject the constraint into the plan, or block intents that depend on publication of the affected dataset. Without that consultation, the system proposes plans over invalid assumptions about the state of the world.

5.1b Confirmation Orchestrator

The Intent Normalizer is the most dangerous single point of failure in the system — a wrong signature contaminates the entire pipeline with deterministic perfection. The Confirmation Orchestrator is the structural mitigation: it inserts an escalation layer between semantic resolution and the Policy Engine, activated selectively by risk.

It does not add friction in 90% of cases. In the 10% where risk is real — PII, ambiguity, writes to critical layers — it is mandatory.

YAMLconfirmation_orchestrator
confirmation_orchestrator:
  modes:
    - auto              # high confidence, low risk — proceeds without interruption
    - soft              # displays the signature and proceeds without conflict
    - hard              # explicit confirmation required
    - human_escalation  # escalation to human review

  escalation_triggers:
    - confidence_score < 0.65 AND contains_pii == true
    - competing_interpretations > 1
    - silver_write AND context_registry.publish_allowed == false
    - gold_write                                  # always escalated
    - estimated_cost > cost_ceiling_dbu * 0.8    # pre-cost above 80% of the ceiling

  human_timeout_seconds: 300
  fallback_on_timeout:
    action: block
    reason: human_confirmation_timeout
    notify: governance_team
    audit_event: HUMAN_ESCALATION_TIMEOUT
Why human_escalation is architectural
Purely automated systems tend to optimize for what is expressible in code. There are classes of ambiguity — interpretation of regulatory context, undocumented business intent, conflict between domains — that no confidence score captures adequately. The human_escalation mode is not a weakness of the system: it is explicit recognition of its limits.

5.2 State Signature

Typed and enriched representation of the intent. It is the formal contract that governs the entire remainder of the pipeline. Once generated and confirmed, the Signature is immutable — any replanning generates a new Signature.

YAMLstate_signature.example
signature:
  domain: fiscal_data_processing
  intent: reconciliation_and_persist
  source_type: nested_json
  target_layer: bronze_to_silver
  datasets:
    - name: nfe
      size: 4TB
      classification: high_volume
    - name: clientes
      size: 500MB
      classification: sensitive
      pii: [cpf, email]
  constraints:
    no_pii_raw: true
    partition_by: [processing_date]
    enforce_types: true
    merge_key_required: true
  execution_context:                        # normative context of the execution
    policy_bundle_version: "v4.2"          # policies in force at that moment
    catalog_snapshot_version: "catalog_2026_03_22"  # state of the catalog

Why execution_context belongs to the Signature

The execution_context is not audit metadata — it is part of the execution contract. With it embedded in the Signature, the system can reproduce any historical execution with the policies and catalog that were in force at that moment, not merely explain what happened. Without it, the Audit Trail records the past but does not make it reproducible. The three elements — formalized intent, state of the environment, and normative context — form the complete contract of an execution.

5.3 Policy Engine

Applies all governance, safety, and FinOps rules before execution. It is the layer that guarantees that the system never executes what it should not.

Policy TypeDetectsPossible Action
Privacy WallUntreated PII usage, joins with sensitive dataRequires anonymization or blocks
FinOps GuardFull scans on large volumes, joins without a temporal filterRequires a filter or blocks
Contract EnforcementAppend to Silver without a merge key, write outside the proper layerConverts the operation or blocks
Execution SafetyForbidden imports in the Signature, declared dangerous code patternsBlocks immediately

Possible Policy Engine outcomes: approve, replan (with declared mandatory interventions), or block.

Declarative Rule Contract

Policy Engine rules are expressed declaratively — condition plus action — and versioned with the policy_bundle_version. The whitepaper does not specify the implementation DSL, but it defines the minimum contract that any rule must satisfy:

YAMLpolicy_rule.contract
policy_rule:
  name: forbid_raw_pii_in_silver
  condition: target_layer == "silver" AND contains_pii_raw == true
  action: block                       # approve | replan | block
  severity: critical
  fault_code: GOVERNANCE_RAW_PII_IN_SILVER

Replanning Limit

The replanning cycle cannot be unlimited. A loop of replans is as problematic in production as a stuck execution, and it has no terminal state without this declared limit:

YAMLreplan_policy
replan_policy:
  max_attempts: 3
  on_max_exceeded:
    action: block
    decision_state: blocked
    reason: max_replan_attempts_exceeded
    audit_required: true

5.4 Composite Intent Model

Composite intents are decomposed into sub-intents with explicit dependencies. In the conceptual model this is represented as an Intent Graph; in implementation, this capability may be absorbed internally by the Execution Planner.

YAMLintent_graph.example
intent_graph:
  root_intent: fiscal_reconciliation_publish
  sub_intents:
    - id: ingest_nfe
    - id: anonymize_client_master
    - id: reconcile_documents
    - id: aggregate_by_state
    - id: publish_audit_view
  dependencies:
    - ingest_nfe -> reconcile_documents
    - anonymize_client_master -> reconcile_documents
    - reconcile_documents -> aggregate_by_state
    - aggregate_by_state -> publish_audit_view
  shared_context:
    consistency_unit: partition
    key: processing_date

5.5 Execution Planner

Generates the execution DAG from the approved Signature. The Planner is not free — it fills templates, follows the plan approved by the Policy Engine, and respects every constraint declared in the Signature.

YAMLexecution_plan.example
execution_plan:
  - step: extract
    source: nfe
    filter: processing_date >= '2026-01-01'     # required by FinOps

  - step: anonymize
    source: clientes
    transform:
      - sha256(cpf) as cpf_hash
      - drop(email)

  - step: join
    type: broadcast
    condition: nfe.cpf = clientes.cpf_hash

  - step: load
    type: merge
    target: silver_documentos
    key: nfe_id                              # required by Contract

Pre-cost Estimator

The FinOps Guard blocks excessive cost after the Policy Engine evaluates the Signature. The Pre-cost Estimator anticipates that evaluation — it estimates cost before the Signature is approved, using execution history for the same intent_signature_hash. This closes the gap between "estimated cost" and "real cost" with an automatic feedback loop.

YAMLpre_cost_estimator
pre_cost_estimator:
  inputs:
    - signature
    - context_registry.environment_state
    - execution_history_by_signature_hash   # last 30 days

  outputs:
    estimated_dbu: float
    estimated_shuffle_gb: float
    cost_risk_score: float               # [0.0 - 1.0]
    confidence: float                    # based on N historical executions

  threshold_policy:
    - if estimated_dbu > cost_ceiling_dbu:
        action: force_replan_with_filter
        reason: PRE_COST_CEILING_EXCEEDED
    - if estimated_dbu > cost_ceiling_dbu * 0.8:
        action: escalate_to_confirmation_orchestrator
        mode: human_escalation

  feedback_loop:
    update_on: post_execution
    key: intent_signature_hash
    window: last_30_days
    model: rolling_weighted_average
Timing Difference
FinOps Guard (Policy Engine) blocks based on heuristics in the Signature. Pre-cost Estimator blocks based on historical evidence for the same hash. The two are complementary — the first is proactive by rule, the second is proactive by learning.

Every execution generated by the Planner must be idempotent. On retry or re-execution after partial failure, the system must not duplicate data, create duplicate records, or produce a result different from what it would have produced on the first successful execution.

YAMLexecution_semantics
execution_semantics:
  idempotency:
    required: true
    strategy:
      - merge_with_deterministic_key   # merge usa merge_key da Signature
      - partition_overwrite_safe        # overwriting an entire partition is idempotent
    forbidden:
      - append_without_dedup            # append puro gera duplicatas em retry

Consistency Unit — Enum Fechado

The consistency_unit type is not free-form — it is selected by rule based on the execution context:

YAMLconsistency_unit.definition
consistency_unit:
  allowed_types:
    - partition          # standard for partitioned batch
    - dataset            # for atomic cross-dataset operations
    - dag_branch         # for composite intents with independent branches
    - time_window        # for micro-batch with a time window

  selection_rules:
    - if execution_mode == batch:           partition
    - if cross_dataset_atomic == true:      dataset
    - if composite_intent == true:         dag_branch
    - if execution_mode == micro_batch:    time_window

5.6 Static Validation

Analysis of the generated code before execution. Detects violations that can be identified without running the job. Belongs to the Static Safety Faults family.

YAMLstatic_analysis.manifest
static_analysis:
  forbidden_tokens:
    - collect()
    - toPandas()
    - crossJoin()
    - cpf                   # PII raw
    - email                 # PII raw
  required_patterns:
    - filter(               # mandatory temporal predicate
    - merge(                # append direto proibido em Silver
  schema_contract:
    expected_columns: [nfe_id, cpf_hash, processing_date]
    forbidden_columns: [cpf, email]

5.7 Runtime Validation

Validates the actual runtime behavior of the execution in the Sandbox. Detects violations that are only visible with real data. Belongs to the Runtime Behavioral Faults family.

YAMLruntime_metrics
runtime_metrics:
  max_rows_output: 10_000_000
  max_shuffle_size_mb: 500
  max_null_ratio: 0.05             # 5%
  expected_output_range:
    min: 1_000
    max: 10_000_000
  schema_drift_detection: true
  cost_ceiling_dbu: 50

5.8 Partial Execution State

In real data systems, total and clean failure is rare. The common case is partial failure: 80% of partitions processed, one branch of the DAG materialized, three out of four datasets conformant. CFA has explicit semantics for this.

Architectural Gap Resolved
Without Partial Execution State, the system governs the input well but becomes semantically loose at the output — part of the plan was materialized, part was not, and the system has no governed way to represent that fact.

Consistency Unit

The system must declare the minimum granularity of governance. Without that, "partial" is a vague word.

YAMLpartial_execution_state
partial_execution_state:
  overall_state: partially_committed

  consistency_unit:
    type: partition             # partition | dataset | dag_branch | time_window
    key: processing_date

  successful_units:
    - dataset: nfe
      partitions: ["2026-01-01", "2026-01-02"]
    - dataset: clientes_anon
      partitions: ["2026-01-01", "2026-01-02"]

  failed_units:
    - dataset: nfe
      partitions: ["2026-01-03"]
      error_code: RUNTIME_SHUFFLE_LIMIT_EXCEEDED
      fault_family: runtime_behavioral

  consistency_policy:
    mode: selective_quarantine
    rollback_required: false
    publish_allowed: false

Partial Failure Policy

The internal sequence of Partial Execution State is deterministic and cannot be reordered:

YAMLpartial_execution_flow
partial_execution_flow:
  sequence:
    - retry_failed_units          # attempts recovery of failed units before any decision
    - apply_partial_failure_policy # determina modo (rollback / quarantine / commit / degraded)
    - project_state               # projects into the Context Registry only after policy is applied
YAMLpartial_failure_policy
partial_failure_policy:
  mode: selective_quarantine
  allowed_partial_commit: true
  publish_on_partial_success: false
  rollback_scope: failed_consistency_unit_only
  promotion_eligible: false
ModeBehaviorQuando usar
full_rollbackAny failure invalidates everythingAtomic pipelines, critical data
selective_quarantineIsolates failed units, preserves valid onesPartitioned batch processing
partial_commit_no_publishWrites partially, does not publish the layerTolerant ingestion pipelines
degraded_publishAllows restricted use with degraded statusCases with declared relaxed SLA

Retry Policy

Failures of individual consistency units are eligible for automatic retry within the same execution. Retry operates on the failed_units, not on the entire execution — preserving what has already been successfully committed.

YAMLretry_policy
retry_policy:
  enabled: true
  max_attempts: 3
  retry_scope: failed_consistency_units_only   # nunca re-executa unidades bem-sucedidas
  backoff:
    strategy: exponential
    base_seconds: 30
  on_max_exceeded:
    action: quarantine_failed_units
    state: quarantined

Publish Semantics

The publish_allowed flag has precise semantics. "Publishing" is not synonymous with "writing" — it means making the dataset available for downstream consumption as trusted data.

YAMLpublish_semantics
publish_semantics:
  definition: dataset_visible_and_trusted_for_downstream_consumption

  states:
    - committed_not_published   # data written, not available downstream
    - published                 # data available and trusted
    - degraded                  # available with declared restrictions (degraded_publish)

  write_vs_publish:
    write: data written to storage
    publish: data declared trustworthy for consumption
    distinction: it is possible to have write without publish — never publish without write

Panic Rollback — Environment Change During Execution

Partial Execution State handles failure of the execution. There is a distinct case: a change in the external environment during execution — a cluster losing nodes, IAM permission revoked, storage quota exceeded. These events are not execution errors — they are invalidations of the context in which execution was authorized. They require their own protocol.

YAMLpanic_rollback
panic_rollback:
  triggers:
    - external_permission_revoked      # IAM, ACL, credencial expirada em runtime
    - compute_resource_unavailable     # cluster loses nodes beyond the threshold
    - storage_quota_exceeded           # storage unavailable in the middle of the write
    - policy_bundle_changed_mid_exec   # policy changed while the job was running

  action: immediate_interrupt_and_isolate
  scope: all_in_flight_consistency_units

  post_panic:
    state: quarantined
    projection_allowed: false          # Context Registry is not updated
    audit_event: PANIC_ROLLBACK_TRIGGERED
    reason_required: true

  distinction_from_partial_execution_state:
    "Partial Execution State: execution failed for an internal reason"
    "Panic Rollback: the external environment invalidated the premises of the execution"
Critical Distinction
Panic Rollback does not project state into the Context Registry. The environment that changed externally makes the generated partial state unreliable — projecting it would mean recording as fact the result of an execution whose premises were invalidated midstream.

5.9 Context Registry

The Context Registry is the cognitive infrastructure of the system. It is not an execution log — it is a living model of the state of the environment. It represents not only "what was done" but also "what state what was done is now in".

It is consulted by the Intent Normalizer before each new intent, and updated by the State Projection Protocol after each execution.

YAMLcontext_registry
context_registry:
  environment_state:
    datasets:
      - name: silver_documentos
        state: partially_committed
        last_successful_partition: "2026-01-02"
        pending_partitions: ["2026-01-03"]
        publish_allowed: false
        quarantined_units: ["2026-01-03"]

      - name: clientes_anon
        state: committed
        publish_allowed: true

  execution_history:
    - intent_id: fiscal_reconciliation_001
      outcome: partially_committed
      consistency_unit: partition
      timestamp: "2026-03-22T14:30:00Z"
      policy_bundle_version: "v4.2"
Critical Protocol
A Context Registry with environment_state is useless if the Intent Normalizer does not consult it. Without that integration, the system proposes plans over false premises about the state of the world — violating Invariant 3.

Registry Versioning

The Context Registry requires explicit versioning. Without it, there is no temporal consistency — and the reproducibility guaranteed by the execution_context in the Signature becomes illusory if the state of the environment at execution time cannot be recovered.

YAMLcontext_registry.versioning
context_registry:
  versioning:
    model: snapshot
    version_id: uuid                 # generated at each atomic write
    timestamp: iso8601

  read_mode:
    type: snapshot_consistent        # read always from a stable snapshot

  write_mode:
    type: atomic_commit              # atomic write through State Projection

Pre-execution Revalidation (TOCTOU)

There is a window between the query to the Context Registry by the Intent Normalizer (step 2) and the actual execution (step 9). During that interval, another execution may project new state onto the same dataset. Without revalidation, the system executes on a stale premise — silently violating Invariant 3.

YAMLpre_execution_revalidation
pre_execution_revalidation:
  trigger: immediately_before_execution   # after the lock is acquired, before the Sandbox
  checks:
    - context_registry.version_id          # is the consulted snapshot still the latest_committed?
    - policy_bundle_version                 # has policy not changed since the Signature?
    - catalog_snapshot_version              # has the catalog not changed since the Signature?
  on_mismatch:
    action: replan_or_abort
    reason: stale_environment_state
    counts_as_replan_attempt: true         # counts toward the Policy Engine max_replan_attempts

TTL and Obsolete State Policy

The Context Registry may accumulate obsolete state indefinitely. A dataset marked as partially_committed for weeks, with no new execution over it, continues to block publication without ever being resolved — silently poisoning future intents. An explicit expiration and stale-state transition policy is required.

YAMLcontext_registry.ttl_policy
ttl_policy:
  partially_committed:
    ttl_days: 7                          # after 7 days without resolution
    on_expiry:
      action: transition_to_stale
      notify: true                       # gera evento no Audit Trail

  quarantined:
    ttl_days: 30                         # long quarantine — requires manual review
    on_expiry:
      action: escalate_to_manual_review
      block_dependent_intents: true

  stale:
    definition: state not updated beyond the TTL without declared resolution
    behavior:
      new_intent_on_stale_dataset: hard_confirmation_required
      publish_blocked: true
      audit_event: CONTEXT_STATE_STALE
Why TTL is architectural, not operational
Without TTL, the Context Registry is not a model of the current state of the world — it is an accumulation of past states that never expire. Over time, every new intent would operate over an environment increasingly contaminated by resolved states that the system still treats as active. TTL is the hygiene layer of the Context Registry.

5.10 State Projection Protocol

The bridge between execution and context. It guarantees that the result of each execution becomes system knowledge before any new intent is processed.

YAMLstate_projection
state_projection:
  trigger: post_runtime_validation      # occurs after validation, before the next intent

  inputs:
    - partial_execution_state
    - execution_metadata

  scope:
    - dataset_state
    - partition_availability
    - publish_allowed
    - quarantined_units
    - pending_partitions

  rules:
    - if overall_state == "partially_committed":
        dataset.state = "partially_committed"
        publish_allowed = false
    - if failed_units exist:
        dataset.pending_units = failed_units
    - if policy.mode == "selective_quarantine":
        dataset.quarantined_units = failed_units

  atomicity: required

  on_projection_failure:
    action: block_next_intent
    reason: environment_state_uncertain

Mandatory Properties

Deterministic
The same input always produces the same projected state
Idempotent
Re-execution does not corrupt the state of the environment
Atomic
Partial projection is treated as projection failure

5.11 Evaluation Indices

Three distinct indices capture different dimensions of execution quality. Each is independent and measurable.

IFo — Operational Fluidity Index IFo = (1 − normalized_latency) × (1 − normalized_cost) × execution_stability
  • normalized_latency ∈ [0,1] — compared to the historical baseline by workload type
  • normalized_cost ∈ [0,1] — DBUs, scanned GB, consumed shuffle
  • execution_stability ∈ [0,1] — success rate, retries, absence of critical failures
IFs — Semantic Fidelity Index IFs = output_contract_adherence × semantic_drift_absence × domain_invariant_preservation
  • output_contract_adherence — delivered schema versus expected schema, types, required columns
  • semantic_drift_absence — expected cardinality, consistent distribution
  • domain_invariant_preservation — key uniqueness, business rules preserved
IFg — Governance Index IFg = policy_compliance × absence_of_pii_exposure × layer_adherence
  • policy_compliance — no active policy violation detected post-execution
  • absence_of_pii_exposure — PII handled according to rule throughout the output
  • layer_adherence — writes respect Bronze/Silver/Gold as declared
IFg Is Binary by Design
IFg = 1 is the only acceptable value for an approved execution. Any value below 1 indicates a governance-invariant violation — something that should not be possible if the Policy Engine worked correctly. If IFg < 1 post-execution, the event should trigger a Policy Engine review, not merely a promotion block. IFg < 1 is a sign of systemic failure, not of a bad execution.

Promotion Criterion

Promotion Gate
Promote  ⟸  IFo ≥ T1  AND  IFs ≥ T2  AND  IFg = 1

IDI — Intent Drift Index

Fourth lifecycle signal, complementary to the three execution indices. IDI detects when the domain has changed and the skill has gone stale before IFs detects degradation in the result. It measures how often executions of the same hash need to be replanned:

IDI — Intent Drift Index IDI = 1 − (replanned_executions / total_executions_same_hash) — last 30 days
  • IDI close to 1.0 → stable intent, domain consistent with the catalog
  • IDI below 0.75 → drift detected → skill goes directly to the watchlist
  • IDI below 0.50 → severe drift → immediate demotion with no observation window
Why IDI Is Necessary Beyond IFs
IFs detects when the result has diverged from the contract. IDI detects when the intent is diverging from the domain before any execution confirms the problem. A falling IDI is an early warning: the catalog changed, a new tax law came into effect, the domain was restructured. Acting on IDI means acting before failure, not after it.

Configurable Evaluation Window

The promotion evaluation window is not fixed. High-volatility domains such as tax and regulatory need shorter windows. Stable domains such as master data and registration can use longer windows:

YAMLevaluation_window.by_domain
evaluation_window:
  default: last_7_days
  by_domain:
    fiscal: last_3_days           # high regulatory volatility
    master_data: last_30_days    # stable domain
    operational: last_7_days     # default

5.12 Skill Lifecycle Management

A system that only promotes skills eventually creates a new graveyard of artifacts. Without a demotion cycle, CFA reproduces the static-catalog problem with architectural pedigree.

States

jit_candidate persisted_active persisted_watchlist deprecated retired

The transition persisted_active → demoted may occur directly, bypassing the watchlist, in the event of a critical contract or governance violation.

Promotion Criterion

Promotion requires evidence accumulated within a time window. A single successful execution is not enough for industrialization:

YAMLpromotion_policy
promotion_policy:
  min_executions: 3                     # minimum number of successful executions
  evaluation_window: last_7_days        # within this window
  thresholds:
    IFo: 0.75
    IFs: 0.90
    IFg: 1.0                             # binary — no exception
  gate: IFo >= T1 AND IFs >= T2 AND IFg = 1 AND executions >= min_executions
  promotion_unit:
    type: intent_signature_hash          # hash of the State Signature — avoids promoting lookalikes
    # two intents with different signatures = different skills, even if semantically close
TriggerDescriptionResulting State
Schema driftOutput contract diverges from the declared contractwatchlist → deprecated
Policy changeThe current policy invalidates the skillactive → demoted
Degraded IFsRecurring drop below T2active → watchlist
High costConsistently low IFoactive → watchlist
Low reuseSkill not used for N periodswatchlist → deprecated
Catalog incompatibilityDataset or domain removed from the catalogactive → retired

Skill Generation Metadata

Every promoted skill must carry provenance metadata. Without it, it is impossible to identify which skills were industrialized under a system version with a promotion bug, making the "Synthetic Legacy" untraceable and unrecoverable at scale.

YAMLskill_generation_metadata
skill_generation_metadata:
  promoted_at: iso8601
  promoted_by_system_version: "cfa_v2.1"    # version of CFA that executed the promotion
  policy_bundle_at_promotion: "v4.2"        # policy active at the time
  catalog_snapshot_at_promotion: "catalog_2026_03_22"
  promotion_scores:
    IFo: 0.88
    IFs: 0.94
    IFg: 1.0
  execution_count_at_promotion: 5
  evaluation_window: last_7_days
Why Generation Metadata Is Architectural
If the promotion logic contained a bug in an earlier system version, generation metadata makes it possible to identify and demote in bulk all skills promoted under that version without having to analyze each skill individually. Without it, the system industrializes error at scale with no selective recovery mechanism.

5.13 Fault Model

Errors in CFA are not exceptions. They are governed events. Each family has a detection point and a defined owner. Mixing families without making the stage explicit creates ambiguity about who detects, when it is detected, and which terminal state is possible.

FamilyDetected InOwnerAction
Semantic Faults
governance, finops, contract
Policy Engine Governance / FinOps layer replan or block before execution
Static Safety Faults
collect(), forbidden imports, raw PII
Static Validation Code analysis layer immediate block, no execution
Runtime Behavioral Faults
cardinality drift, excessive shuffle, schema divergence
Runtime Validation Sandbox / Observability quarantine, rollback, or partial state
Environmental Faults
revoked IAM, cluster loss, storage quota, mid-execution policy change
Sandbox Monitor (external runtime) Infrastructure layer Panic Rollback — quarantine with no projection

Structure of a Fault Event

YAMLerror_signature.example
error_signature:
  code: FINOPS_UNBOUNDED_SCAN
  family: semantic_faults
  severity: high
  stage: policy_engine
  detected_before_execution: true
  mandatory_action: replan_or_block
  remediation:
    - "Apply a temporal filter on processing_date"

5.14 Decision Engine

Consolidates the result of all validations and produces the final state of the intent. When components diverge — the Policy Engine approved, Runtime detected drift, Partial Policy allowed commit — the Decision Engine applies the precedence rule to determine who has authority.

Decision Precedence

Invariants and components have different roles in the decision. Invariants are not participants. They are the boundary of the field. If an invariant is violated, the result is determined before any other evaluation.

YAMLdecision_precedence
decision_precedence:
  veto_absolute:
    - invariants              # a violation cancels everything; no vote is possible
                               # they are not just another participant; they are the edge of the field
  ordered_authority:
    - policy_engine           # 1st: decides pre-execution (approve / replan / block)
    - runtime_validation      # 2nd: decides over the real result (may escalate severity)
    - partial_failure_policy  # 3rd: decides commit granularity
Example of a Resolved Conflict
The Policy Engine approved. Runtime detected mild drift within the threshold. Partial Policy allowed selective commit. Result: approved_with_warnings. Runtime escalated from info to warning, Partial Policy applied selective commit, and no invariant was violated.

Conflict Resolution Between Authorities

When policy_engine, runtime_validation, and partial_failure_policy produce divergent outcomes, the resolution strategy is strictest_outcome: the most restrictive result among the active authorities prevails:

YAMLdecision_engine.conflict_resolution
decision_engine:
  conflict_resolution:
    strategy: strictest_outcome
    # if policy_engine = approved and runtime = quarantine → quarantine
    # if policy_engine = replanned and partial = partially_committed → replanned
    # severity ordering: blocked > quarantined > rolled_back > partially_committed
    #                   > approved_with_warnings > approved

Possible States

YAMLdecision_states
decision_state:
  - approved                  # complete, validated, publishable execution
  - approved_with_warnings    # complete execution, non-blocking alerts
  - replanned                 # Policy Engine required changes; new cycle
  - blocked                   # unrecoverable pre-execution violation
  - partially_committed       # partial result; Partial Failure Policy applied
  - quarantined              # isolated units; investigation required
  - rolled_back              # state reverted due to critical failure
  - promotion_candidate       # IFo+IFs+IFg above the thresholds

5.15 Audit Trail

Immutable record of all decision events. It is not conventional logging. These are typed events, correlated by intent_id and versioned by policy_bundle. It serves two consumers with distinct needs that become incompatible if not addressed explicitly.

YAMLaudit_trail.example
audit_trail:
  intent_id: fiscal_reconciliation_001
  correlation_id: session_4f9a2c
  policy_bundle_version: v4.2

  events:
    - stage: intent_normalizer
      event_type: semantic_resolution
      outcome: resolved
      confidence: 0.81
      confirmation_mode: hard_required
      environment_state_consulted: true

    - stage: policy_engine
      event_type: policy_replan
      outcome: replanned
      faults: [GOVERNANCE_RAW_PII_JOIN, FINOPS_MISSING_TEMPORAL_PREDICATE]

    - stage: static_validation
      event_type: code_analysis
      outcome: passed

    - stage: execution
      event_type: partial_commit
      outcome: partially_committed
      consistency_unit: partition
      successful: ["2026-01-01", "2026-01-02"]
      failed: ["2026-01-03"]

    - stage: state_projection
      event_type: environment_update
      outcome: projected

    - stage: promotion_engine
      event_type: promotion_evaluation
      outcome: not_eligible
      reason: partial_execution

Consumption Modes

Operational — Engineering
Format: structured JSON
Use: debugging, performance tuning, failure analysis, re-execution of failed partitions
Regulatory — Audit
Format: normalized version + cryptographic signature
Use: compliance, decision traceability, governance evidence

Properties of the Record

Immutable after write. Causally ordered by intent_id, not merely chronologically. Correlated across subintents. Versioned by the policy_bundle active at execution time, enabling historical replay with the original policy.

YAMLaudit_trail.guarantees
audit_trail:
  immutability: append_only              # events are only appended, never modified or removed
  ordering: causal_order              # not just timestamp order; causal order of events
  guarantees:
    - no_event_reordering             # event A that caused B always appears before B
    - immutable_after_write
    - complete_per_intent             # every intent_id has both start and end recorded

  decision_state_enum:               # states are enumerated; no ad hoc states
    - approved
    - approved_with_warnings
    - replanned
    - blocked
    - partially_committed
    - quarantined
    - rolled_back
    - promotion_candidate
Section 06

6. System Invariants

Invariants are the properties the system must always preserve. They differ from benefits: they do not describe what the system does well, but what it guarantees. An invariant violation is a system failure, not a user failure.

I1
PII PROTECTION

No data classified as PII is published without explicit treatment (hashing or confirmed removal). This applies to all outputs, intermediate and final.

I2
GOVERNED WRITING

No Silver or Gold layer receives writes without a merge key or a mutability policy defined and approved by the Policy Engine.

I3
STATE AWARENESS

No intent is processed without consulting the environment_state from the Context Registry. The state of the environment is a mandatory input to the Intent Normalizer.

I4
MANDATORY PROJECTION

Every partial_execution_state must be projected into the Context Registry before the next intent is processed. If projection fails, the next intent is blocked until the environment is consistent.

I5
COMPLETE AUDITABILITY

Every decision generates a structured event in the Audit Trail. Executions without an audit trail are treated as unauthorized executions.

I6
SAFE EXECUTION

No approved execution contains forbidden operations. If a forbidden operation is detected at runtime (after Static Validation), execution is immediately interrupted and the generated partial state is quarantined before any projection into the Context Registry.

Precedence over I4
I7
PUBLICATION CONSISTENCY

No dataset in state partially_committed may be published as trusted. The publish_allowed flag is managed by the State Projection Protocol and is only released after complete resolution of the pending units.

I8
TOTAL REPRODUCIBILITY

Every execution must be reproducible with exactly the same results from the same inputs: user_intent + context_registry.version_id + policy_bundle_version + catalog_snapshot_version. If the Audit Trail and the Signatures execution_context do not allow exact re-execution, the execution violates I8 and the skill is permanently blocked from promotion.

Permanently blocks promotion if violated

Precedence Between Invariants

Precedence Rule: Safety and state-protection invariants (I6, I7) take precedence over continuity invariants (I4). In case of conflict, specifically when an execution with a forbidden operation detected at runtime would generate partial state to be projected, the system must preserve environmental integrity before guaranteeing flow progression. Projection only occurs after confirmed quarantine of the affected units. I8 (Reproducibility) operates independently: its violation does not block future executions, but it permanently blocks promotion. It is a quality guarantee, not a safety guarantee.

Without this declared precedence rule, an implementer could interpret I4 (mandatory projection) as more important than I6 (immediate interruption), projecting contaminated state into the Context Registry, which is exactly the scenario that destroys system integrity.

Section 07

7. End-to-End Flow

#StepComponentInvariant / Guarantee
1Intent received
2Consult environment_state (versioned snapshot)Context RegistryI3 — latest_committed snapshot
3Lock check — queue or reject by conflict_resolutionConcurrency Controlsingle_active_intent_per_scope
4Normalization + semantic resolutionIntent Normalizer
5Semantic escalation (auto / soft / hard / human)Confirmation Orchestratorfallback: block + notify on timeout
6Generate State Signature + execution_contextState Signaturepolicy_bundle + catalog versioned
7Apply policies (max 3 replans)Policy EngineI1, I2
8Pre-execution cost estimationPre-cost Estimatorfeedback loop by signature_hash
9Planning (DAG / Composite + idempotence)Execution Plannermandatory merge_key
10Static code validationStatic ValidationI6
11Acquire Execution Lock (target_scope)Concurrency Controlexclusivity before revalidation
12Pre-execution revalidation (TOCTOU)Pre-execution Revalidationversion_id + policy + catalog
13Execution in the SandboxExecutionI6, idempotence, Panic Rollback if env changes
14Runtime behavior validationRuntime Validation
15retry → partial_failure_policy → project_statePartial Execution StateI7, consistency_unit
16Projection + release lock (atomic, versioned)State Projection ProtocolI4, I6 (precedence)
17Final decision — strictest_outcome among authoritiesDecision Enginedecision_state_enum
18Lifecycle evaluation — IFo + IFs + IFg + IDI by hashPromotion / Demotion EngineI8 — reproducibility
19Record in the Audit Trail (append_only, causal order)Audit TrailI5

Step 5 (Confirmation Orchestrator) is the point where human ambiguity is explicitly addressed before any planning. The lock (step 11) is acquired before revalidation (step 12), closing the residual race condition. IDI in step 18 detects domain drift before IFs detects result degradation.

Section 08

8. Benefits

Governance
Real enforcement before execution. Explicit PII control. Structured auditability with two consumption modes.
FinOps
Cost prevention before execution. Blocking of full scans and predicate-less joins. Automatic plan optimization through replanning.
Operational Robustness
Partial-failure tolerance with explicit policy. Consistent state of the environment. No executions that leave invisible traces.
Adaptive Evolution
Selective industrialization based on evidence. Full skill lifecycle: promotion and demotion. No accumulation of obsolete artifacts.
Generalization
Less coupling between intent and implementation. JIT generation eliminates the static catalog. Support for intent composition.
Explainability
Every decision is traceable. The system explains why it blocked, replanned, or approved. Native regulatory compliance.
Section 09

9. Extension Points

A closed system does not become a standard; it becomes one company's product. For CFA to be adoptable as a framework, it needs declared extension points where specific implementations can inject domain behavior without modifying the core.

Extension points are interface contracts, not implementations. Each point defines what CFA expects to receive and what it guarantees to return. What happens inside is the responsibility of the implementation.

YAMLextension_points
extension_points:

  intent_normalizer:
    - custom_domain_resolver        # domain-specific interpretation logic
    - catalog_enrichment_hook        # external catalog enrichment before the Signature
    - confidence_scorer_override     # custom semantic-confidence model

  policy_engine:
    - custom_rule_provider           # external declarative rules (regulatory, domain)
    - external_compliance_validator  # validation against non-AI legacy systems

  execution_planner:
    - custom_template_library        # domain-specific code templates (Spark, SQL, dbt)
    - pre_cost_estimator_backend     # cost-estimation backend (AQE, historical)

  context_registry:
    - storage_backend                # Delta, DynamoDB, Redis, etc.
    - snapshot_provider              # snapshot_consistent implementation

  promotion_engine:
    - custom_promotion_signal        # external business-value signal
    - domain_window_provider         # evaluation_window by domain

  audit_trail:
    - storage_backend                # S3, Kafka, OpenLineage, etc.
    - regulatory_formatter           # audit format by jurisdiction
CFA Kernel (minimum adoptable set)
Intent Normalizer + State Signature + Policy Engine + Static Validation + Audit Trail. Real governance in two weeks. Extension points active but with default implementations.
CFA Full (complete architecture)
All 16 components + customized extension points. Persistent Context Registry, Partial Execution State, Promotion Engine with full lifecycle.
Why Extension Points Transform CFA
Without extension points, CFA is an architecture that a company implements from scratch. With extension points, CFA is a framework that a company adopts and extends. These are completely different adoption models. The latter is what allows multiple independent implementations to coexist under the same contract, which is the minimum requirement for any market standard.
Section 10

10. Execution Scope (v2)

CFA v2 was designed and validated for a specific execution model. Declaring it explicitly protects the document against improper extrapolation and defines where the architectural guarantees are valid.

YAMLexecution_model
execution_model:
  version: v2
  supported_modes:
    - batch
    - micro_batch
  concurrent_intent_support: false         # one active intent per target scope
  streaming_support: planned_for_v3
  consistency_assumption: single_active_intent_per_target_scope

Why batch and micro-batch — rather than streaming

Three CFA v2 components implicitly assume discrete execution boundaries, which exist in batch and micro-batch but are problematic in continuous streaming:

ComponentAssumption in v2Problem in Streaming
Partial Execution State Discrete consistency unit (partition, time window) Streaming requires offset, epoch, or checkpoint semantics, not partitions
State Projection Protocol Post-validation trigger with a clear boundary between execution and projection In asynchronous streaming, the execution → projection linearity can break
Context Registry Environmental state with enough coherence to guide the next intent Under real concurrency, it requires locking, optimistic versioning, and causal ordering
Central Operational Hypothesis
CFA v2 operates under the hypothesis of single_active_intent_per_target_scope: two intents do not write to the same dataset/layer at the same time. This hypothesis is what allows the Context Registry to be consulted and projected consistently without requiring distributed concurrency mechanisms. Violating this hypothesis without the v3 adaptations results in untrustworthy environmental state.

Concurrency Control

The declaration concurrent_intent_support: false is not sufficient on its own. It is necessary to define what happens when two intents arrive simultaneously over the same scope. Without that definition, race conditions are possible and Invariant 3 can be silently violated.

YAMLconcurrency_control
concurrency_control:
  model: single_active_intent_per_target_scope

  scope_definition:
    - dataset
    - dataset_partition

  lock:
    acquisition: before_pre_execution_revalidation   # step 10 in the flow
    release: after_state_projection_confirmed        # step 15 in the flow

  conflict_resolution:
    on_conflict: queue              # queue | reject
    # queue: second intent waits for the scope to be released
    # reject: second intent receives blocked immediately

  locking:
    type: optimistic
    validation: pre_execution_revalidation

The queue mode is the recommended default for batch because it avoids losing work. The reject mode is appropriate for scenarios where latency matters more than execution guarantee.

Snapshot Selection

The ambiguity over which snapshot the Context Registry serves on each read is resolved by a single rule:

YAMLcontext_registry.snapshot_selection
context_registry:
  snapshot_selection:
    strategy: latest_committed_before_intent_resolution
    # always the most recent snapshot that has already been atomically committed
    # a snapshot in progress (write underway) is never served

Scope Roadmap

VersionAdded ScopeImpacted Components
v2 (current)Batch and micro-batch, sequential intent per scope
v3 (planned)Continuous streaming, concurrent intentsPartial Execution State, State Projection, Context Registry
Section 11

11. Limitations

Architectural Honesty
CFA solves real problems, but it introduces real complexity. The limitations below are not implementation details; they are structural trade-offs that must be evaluated before adoption.
LimitationImpactMitigation
Implementation complexityHigh initial cost. Multiple interdependent components.Phased implementation (see strategy below)
Dependence on a rich catalogThe Intent Normalizer is only as good as the catalog it consults.Invest in a data catalog before CFA
Continuous threshold tuningT1 and T2 for IFo and IFs require domain calibration.Historical baselines + incremental adjustment
Policy complexityThe Policy Engine requires rule maintenance over time.Version the policy_bundle in the Audit Trail
Additional latencyThe validation pipeline adds pre-execution latency.Auto mode for low-risk intents

Phased Implementation Strategy

PhaseScopeResult
Phase 1Kernel without execution: Intent Normalizer + State Signature + Policy EngineDecision system. Evaluates intents; does not execute.
Phase 2Controlled code generation + Static ValidationDeterministic execution over an approved plan.
Phase 3Sandbox + Runtime Validation + Partial Execution StatePartial-failure tolerance and real metrics.
Phase 4Context Registry + State Projection + Audit TrailPersistent state and complete observability.
Phase 5Promotion / Demotion Engine + IFo/IFs/IFgAdaptive industrialization with lifecycle.
Section 12

12. Non-Goals (Negative Scope)

A serious technical whitepaper states what the system does not aim to do. Non-goals protect against improper extrapolation and define where the system ends, which is as important as defining where it begins.

YAMLnon_goals.v2
non_goals:
  # Execution scope
  - real_time_stream_processing       # planned for v3
  - distributed_multi_writer_concurrency  # requires a Context Registry redesign
  - sub_second_latency_execution       # validation overhead is intentional

  # System responsibility
  - data_quality_enforcement           # CFA governs execution, not the intrinsic quality of data
  - schema_inference                   # schema comes from the catalog; it is not inferred
  - business_rule_definition           # CFA applies rules; it does not define them

  # Implementation
  - storage_engine                     # engine-agnostic (Spark, Databricks, etc.)
  - catalog_implementation             # depends on an external catalog; does not implement it
  - policy_dsl_specification           # rule contract is defined; DSL is an implementation choice
Why Non-Goals Matter
An architect who tries to use CFA v2 for continuous streaming, concurrent multi-writer execution, or sub-second execution will face real problems, not due to implementation limitation, but due to architectural incompatibility. Declaring the non-goals protects both the system and its adopters.
Section 13

13. Conclusion

CFA v2 represents a paradigm shift in the relationship between AI and critical data systems:

From
Systems that execute commands: fast, fragile, and unguided.
To
Systems that understand intents, evaluate the state of the world, constrain themselves to what is permitted, execute with controlled granularity, and then record, learn, and evolve.

What separates CFA v2 from an elegant concept is the presence of the elements that make real systems operable in production:

Formalization of what is uncertain: semantic resolution with confidence scores and confirmation modes treats ambiguity as system data, not as user failure. Tolerance for what partially fails: Partial Execution State with consistency_unit transforms partial failure from an unrepresentable event into a governed state with explicit policy. Memory of the state of the world: the Context Registry with environment_state closes the loop between what was executed and what may be executed next. Formal guarantees: the seven invariants with precedence rules make the system definable, testable, and auditable.

CFA v2 is a deterministic architecture for governed execution that guarantees semantic consistency, state integrity, and complete auditability through typed intent resolution, explicit concurrency control, pre-execution validation, idempotent execution, versioned state projection, and invariant-based enforcement.

This document is the product of a collaboration among AI architects. Each revision incorporated real criticism about executability, not only conceptual elegance. The natural next step is the technical blueprint: Python classes, interfaces, and mapping of each component to implementable code.