Skip to main content

AI project generation

Project generation starts from one of two inputs:

  • explicit CLI parameters, when the user already knows source, target and mode;
  • natural-language intent, when ContractForge AI must extract the project specification first.

Both paths produce a ProjectPlan before writing files. The plan contains artifacts, assumptions, required decisions, warnings and traceability evidence.

Canonical files

ContractForge AI writes the same public ContractForge structure a platform team would write by hand:

project.yaml
environments/
databricks.environment.yaml
aws.environment.yaml
connections/
source.yaml
contracts/
bronze/
b_products/
b_products.ingestion.yaml
b_products.annotations.yaml
b_products.operations.yaml
b_products.access.yaml
README.md
RUNBOOK.md
VALIDATION.md
DECISIONS.md
AI_REVIEW.html or PROJECT_REVIEW.html

The generator must not emit legacy flat fields such as target_table, target_schema, ctrl_schema or top-level source_system.

Explicit generation

Use generate-project when the project is already specified:

contractforge-ai generate-project \
--target aws-glue-iceberg \
--schema schemas/products.json \
--project-name supabase_products_aws \
--connector postgres \
--source-path "jdbc:postgresql://aws-1-us-east-1.pooler.supabase.com:6543/postgres?sslmode=require" \
--target-catalog contractforge \
--target-schema bronze \
--target-table b_products \
--mode hash_diff_upsert \
--schedule-cron "0 6 * * *" \
--schedule-timezone America/Sao_Paulo \
--output-dir generated/supabase-aws

Equivalent Databricks generation uses the same contract intent:

contractforge-ai generate-project \
--target databricks-dab \
--schema schemas/products.json \
--project-name supabase_products_databricks \
--connector postgres \
--source-path "jdbc:postgresql://aws-1-us-east-1.pooler.supabase.com:6543/postgres?sslmode=require" \
--target-catalog contractforge \
--target-schema bronze \
--target-table b_products \
--mode hash_diff_upsert \
--schedule-cron "0 6 * * *" \
--schedule-timezone America/Sao_Paulo \
--output-dir generated/supabase-databricks

The meaningful differences should be project and environment files, not the portable ingestion semantics.

Project YAML shape

project.yaml is the project inventory and scheduling boundary:

name: supabase_products

schedule:
cron: "0 6 * * *"
timezone: America/Sao_Paulo
enabled: false

environments:
databricks: environments/databricks.environment.yaml
aws: environments/aws.environment.yaml

connections:
supabase: connections/supabase.yaml

execution_order:
- name: bronze_products
depends_on: []
contracts:
databricks: contracts/bronze/b_products/b_products.ingestion.yaml
aws: contracts/bronze/b_products/b_products.ingestion.yaml

The same contract path is preferred for Databricks and AWS. Separate paths are only needed when a reviewed adapter extension is necessary.

Environment YAML shape

Environment files hold adapter and deployment settings, not dataset semantics:

name: aws
runtime: aws_glue_iceberg

artifacts:
destination:
type: s3
path: s3://contractforge-artifacts/projects/supabase_products/

evidence:
destination:
type: iceberg_table
database: cf_supabase_ops

extensions:
aws:
glue_version: "4.0"
worker_type: G.1X

The ingestion contract still owns source, target, write mode, transforms, quality and access intent.

Guided generation

Use guided-project when one command should plan and scaffold:

contractforge-ai guided-project \
--intent "Create a bronze to gold Supabase medallion project for AWS and Databricks. Run daily at 6 in America/Sao_Paulo. Use hash_diff_upsert for bronze products and append for movements." \
--schema schemas/products.json \
--target contractforge-yaml \
--allow-review-required \
--output-dir generated/supabase-medallion

The planner extracts:

  • source system and connector;
  • requested layers;
  • target platform hints;
  • write modes;
  • schedule and timezone;
  • governance and quality expectations;
  • required decisions such as merge keys and hash column policy.

Missing or unsafe decisions are not guessed. They are written to the review report.

Provider-enriched generation

Use --with-ai when a provider should enrich the deterministic project spec:

contractforge-ai guided-project \
--intent "Create a REST GeoJSON medallion ingestion for USGS earthquakes into Databricks and AWS. Keep source portable and generate quality checks for magnitude and event_id." \
--schema schemas/usgs-events.json \
--target contractforge-yaml \
--with-ai \
--provider openai \
--allow-review-required \
--output-dir generated/usgs

Provider enrichment can propose draft:

  • transform and shape;
  • quality rules;
  • annotations;
  • operations metadata;
  • target selection when unresolved;
  • review questions and explanations.

Provider enrichment cannot silently change:

  • connector;
  • source path;
  • target catalog/schema/table;
  • layer;
  • write mode;
  • platform support status;
  • secrets;
  • deployment settings.

Behavior-changing suggestions stay review-required even when they are written into draft artifacts for inspection.

Multi-schema projects

When a prompt references many schemas, pass them together:

contractforge-ai generate \
--prompt "Create a Supabase medallion project for products and product_movements. Use the same shared JDBC connection. Products use hash_diff_upsert; movements use append." \
--schemas schemas/products.json schemas/product_movements.json \
--with-ai \
--provider openai \
--output-dir generated/supabase-multi

The generator should use one shared connection YAML when the source connector is the same. Dataset-specific overrides stay in each ingestion contract.

Connection inheritance

Shared connection YAMLs centralize endpoint, auth and common read options:

# connections/supabase.yaml
source:
type: connector
connector: postgres
system: supabase
options:
url: "{{ secret:supabase/jdbc_url }}"
driver: org.postgresql.Driver
auth:
type: basic
username: "{{ secret:supabase/user }}"
password: "{{ secret:supabase/password }}"
read:
fetchsize: 20000

An ingestion contract can inherit and override only the dataset-specific fields:

source:
type: connection
connection_path: project://connections/supabase.yaml
table: public.products
read:
partition_column: product_id
lower_bound: 1
upper_bound: 1000000
num_partitions: 8

Ingestion-level values override the global connection. The core resolves the connection before adapters plan or execute the contract.

Project output

A generated project normally contains:

project.yaml
environments/
databricks.environment.yaml
aws.environment.yaml
connections/
source.yaml
contracts/
bronze/
b_products/
b_products.ingestion.yaml
b_products.annotations.yaml
b_products.operations.yaml
README.md
RUNBOOK.md
VALIDATION.md
DECISIONS.md
AI_REVIEW.html or PROJECT_REVIEW.html

Validate the folder after generation:

contractforge-ai validate-project-structure generated/supabase-multi \
--adapter databricks \
--adapter aws \
--format html > generated/supabase-multi/project_validation.html