AI project generation

Project generation starts from one of two inputs:

explicit CLI parameters, when the user already knows source, target and mode;
natural-language intent, when ContractForge AI must extract the project specification first.

Both paths produce a ProjectPlan before writing files. The plan contains artifacts, assumptions, required decisions, warnings and traceability evidence.

Canonical files

ContractForge AI writes the same public ContractForge structure a platform team would write by hand:

project.yaml
environments/
  databricks.environment.yaml
  aws.environment.yaml
connections/
  source.yaml
contracts/
  bronze/
    b_products/
      b_products.ingestion.yaml
      b_products.annotations.yaml
      b_products.operations.yaml
      b_products.access.yaml
README.md
RUNBOOK.md
VALIDATION.md
DECISIONS.md
AI_REVIEW.html or PROJECT_REVIEW.html

The generator must not emit legacy flat fields such as target_table, target_schema, ctrl_schema or top-level source_system.

Explicit generation

Use generate-project when the project is already specified:

contractforge-ai generate-project \
  --target aws-glue-iceberg \
  --schema schemas/products.json \
  --project-name supabase_products_aws \
  --connector postgres \
  --source-path "jdbc:postgresql://aws-1-us-east-1.pooler.supabase.com:6543/postgres?sslmode=require" \
  --target-catalog contractforge \
  --target-schema bronze \
  --target-table b_products \
  --mode hash_diff_upsert \
  --schedule-cron "0 6 * * *" \
  --schedule-timezone America/Sao_Paulo \
  --output-dir generated/supabase-aws

Equivalent Databricks generation uses the same contract intent:

contractforge-ai generate-project \
  --target databricks-dab \
  --schema schemas/products.json \
  --project-name supabase_products_databricks \
  --connector postgres \
  --source-path "jdbc:postgresql://aws-1-us-east-1.pooler.supabase.com:6543/postgres?sslmode=require" \
  --target-catalog contractforge \
  --target-schema bronze \
  --target-table b_products \
  --mode hash_diff_upsert \
  --schedule-cron "0 6 * * *" \
  --schedule-timezone America/Sao_Paulo \
  --output-dir generated/supabase-databricks

The meaningful differences should be project and environment files, not the portable ingestion semantics.

Project YAML shape

project.yaml is the project inventory and scheduling boundary:

name: supabase_products

schedule:
  cron: "0 6 * * *"
  timezone: America/Sao_Paulo
  enabled: false

environments:
  databricks: environments/databricks.environment.yaml
  aws: environments/aws.environment.yaml

connections:
  supabase: connections/supabase.yaml

execution_order:
  - name: bronze_products
    depends_on: []
    contracts:
      databricks: contracts/bronze/b_products/b_products.ingestion.yaml
      aws: contracts/bronze/b_products/b_products.ingestion.yaml

The same contract path is preferred for Databricks and AWS. Separate paths are only needed when a reviewed adapter extension is necessary.

Environment YAML shape

Environment files hold adapter and deployment settings, not dataset semantics:

name: aws
runtime: aws_glue_iceberg

artifacts:
  destination:
    type: s3
    path: s3://contractforge-artifacts/projects/supabase_products/

evidence:
  destination:
    type: iceberg_table
    database: cf_supabase_ops

extensions:
  aws:
    glue_version: "4.0"
    worker_type: G.1X

The ingestion contract still owns source, target, write mode, transforms, quality and access intent.

Guided generation

Use guided-project when one command should plan and scaffold:

contractforge-ai guided-project \
  --intent "Create a bronze to gold Supabase medallion project for AWS and Databricks. Run daily at 6 in America/Sao_Paulo. Use hash_diff_upsert for bronze products and append for movements." \
  --schema schemas/products.json \
  --target contractforge-yaml \
  --allow-review-required \
  --output-dir generated/supabase-medallion

The planner extracts:

source system and connector;
requested layers;
target platform hints;
write modes;
schedule and timezone;
governance and quality expectations;
required decisions such as merge keys and hash column policy.

Missing or unsafe decisions are not guessed. They are written to the review report.

Provider-enriched generation

Use --with-ai when a provider should enrich the deterministic project spec:

contractforge-ai guided-project \
  --intent "Create a REST GeoJSON medallion ingestion for USGS earthquakes into Databricks and AWS. Keep source portable and generate quality checks for magnitude and event_id." \
  --schema schemas/usgs-events.json \
  --target contractforge-yaml \
  --with-ai \
  --provider openai \
  --allow-review-required \
  --output-dir generated/usgs

Provider enrichment can propose draft:

transform and shape;
quality rules;
annotations;
operations metadata;
target selection when unresolved;
review questions and explanations.

Provider enrichment cannot silently change:

connector;
source path;
target catalog/schema/table;
layer;
write mode;
platform support status;
secrets;
deployment settings.

Behavior-changing suggestions stay review-required even when they are written into draft artifacts for inspection.

Multi-schema projects

When a prompt references many schemas, pass them together:

contractforge-ai generate \
  --prompt "Create a Supabase medallion project for products and product_movements. Use the same shared JDBC connection. Products use hash_diff_upsert; movements use append." \
  --schemas schemas/products.json schemas/product_movements.json \
  --with-ai \
  --provider openai \
  --output-dir generated/supabase-multi

The generator should use one shared connection YAML when the source connector is the same. Dataset-specific overrides stay in each ingestion contract.

Connection inheritance

Shared connection YAMLs centralize endpoint, auth and common read options:

# connections/supabase.yaml
source:
  type: connector
  connector: postgres
  system: supabase
  options:
    url: "{{ secret:supabase/jdbc_url }}"
    driver: org.postgresql.Driver
auth:
  type: basic
  username: "{{ secret:supabase/user }}"
  password: "{{ secret:supabase/password }}"
read:
  fetchsize: 20000

An ingestion contract can inherit and override only the dataset-specific fields:

source:
  type: connection
  connection_path: project://connections/supabase.yaml
  table: public.products
  read:
    partition_column: product_id
    lower_bound: 1
    upper_bound: 1000000
    num_partitions: 8

Ingestion-level values override the global connection. The core resolves the connection before adapters plan or execute the contract.

Project output

A generated project normally contains:

project.yaml
environments/
  databricks.environment.yaml
  aws.environment.yaml
connections/
  source.yaml
contracts/
  bronze/
    b_products/
      b_products.ingestion.yaml
      b_products.annotations.yaml
      b_products.operations.yaml
README.md
RUNBOOK.md
VALIDATION.md
DECISIONS.md
AI_REVIEW.html or PROJECT_REVIEW.html

Validate the folder after generation:

contractforge-ai validate-project-structure generated/supabase-multi \
  --adapter databricks \
  --adapter aws \
  --format html > generated/supabase-multi/project_validation.html

Canonical files​

Explicit generation​

Project YAML shape​

Environment YAML shape​

Guided generation​

Provider-enriched generation​

Multi-schema projects​

Connection inheritance​

Project output​