AI project generation
Project generation starts from one of two inputs:
- explicit CLI parameters, when the user already knows source, target and mode;
- natural-language intent, when ContractForge AI must extract the project specification first.
Both paths produce a ProjectPlan before writing files. The plan contains
artifacts, assumptions, required decisions, warnings and traceability evidence.
Canonical files
ContractForge AI writes the same public ContractForge structure a platform team would write by hand:
project.yaml
environments/
databricks.environment.yaml
aws.environment.yaml
connections/
source.yaml
contracts/
bronze/
b_products/
b_products.ingestion.yaml
b_products.annotations.yaml
b_products.operations.yaml
b_products.access.yaml
README.md
RUNBOOK.md
VALIDATION.md
DECISIONS.md
AI_REVIEW.html or PROJECT_REVIEW.html
The generator must not emit legacy flat fields such as target_table,
target_schema, ctrl_schema or top-level source_system.
Explicit generation
Use generate-project when the project is already specified:
contractforge-ai generate-project \
--target aws-glue-iceberg \
--schema schemas/products.json \
--project-name supabase_products_aws \
--connector postgres \
--source-path "jdbc:postgresql://aws-1-us-east-1.pooler.supabase.com:6543/postgres?sslmode=require" \
--target-catalog contractforge \
--target-schema bronze \
--target-table b_products \
--mode hash_diff_upsert \
--schedule-cron "0 6 * * *" \
--schedule-timezone America/Sao_Paulo \
--output-dir generated/supabase-aws
Equivalent Databricks generation uses the same contract intent:
contractforge-ai generate-project \
--target databricks-dab \
--schema schemas/products.json \
--project-name supabase_products_databricks \
--connector postgres \
--source-path "jdbc:postgresql://aws-1-us-east-1.pooler.supabase.com:6543/postgres?sslmode=require" \
--target-catalog contractforge \
--target-schema bronze \
--target-table b_products \
--mode hash_diff_upsert \
--schedule-cron "0 6 * * *" \
--schedule-timezone America/Sao_Paulo \
--output-dir generated/supabase-databricks
The meaningful differences should be project and environment files, not the portable ingestion semantics.
Project YAML shape
project.yaml is the project inventory and scheduling boundary:
name: supabase_products
schedule:
cron: "0 6 * * *"
timezone: America/Sao_Paulo
enabled: false
environments:
databricks: environments/databricks.environment.yaml
aws: environments/aws.environment.yaml
connections:
supabase: connections/supabase.yaml
execution_order:
- name: bronze_products
depends_on: []
contracts:
databricks: contracts/bronze/b_products/b_products.ingestion.yaml
aws: contracts/bronze/b_products/b_products.ingestion.yaml
The same contract path is preferred for Databricks and AWS. Separate paths are only needed when a reviewed adapter extension is necessary.
Environment YAML shape
Environment files hold adapter and deployment settings, not dataset semantics:
name: aws
runtime: aws_glue_iceberg
artifacts:
destination:
type: s3
path: s3://contractforge-artifacts/projects/supabase_products/
evidence:
destination:
type: iceberg_table
database: cf_supabase_ops
extensions:
aws:
glue_version: "4.0"
worker_type: G.1X
The ingestion contract still owns source, target, write mode, transforms, quality and access intent.
Guided generation
Use guided-project when one command should plan and scaffold:
contractforge-ai guided-project \
--intent "Create a bronze to gold Supabase medallion project for AWS and Databricks. Run daily at 6 in America/Sao_Paulo. Use hash_diff_upsert for bronze products and append for movements." \
--schema schemas/products.json \
--target contractforge-yaml \
--allow-review-required \
--output-dir generated/supabase-medallion
The planner extracts:
- source system and connector;
- requested layers;
- target platform hints;
- write modes;
- schedule and timezone;
- governance and quality expectations;
- required decisions such as merge keys and hash column policy.
Missing or unsafe decisions are not guessed. They are written to the review report.
Provider-enriched generation
Use --with-ai when a provider should enrich the deterministic project spec:
contractforge-ai guided-project \
--intent "Create a REST GeoJSON medallion ingestion for USGS earthquakes into Databricks and AWS. Keep source portable and generate quality checks for magnitude and event_id." \
--schema schemas/usgs-events.json \
--target contractforge-yaml \
--with-ai \
--provider openai \
--allow-review-required \
--output-dir generated/usgs
Provider enrichment can propose draft:
transformandshape;- quality rules;
- annotations;
- operations metadata;
- target selection when unresolved;
- review questions and explanations.
Provider enrichment cannot silently change:
- connector;
- source path;
- target catalog/schema/table;
- layer;
- write mode;
- platform support status;
- secrets;
- deployment settings.
Behavior-changing suggestions stay review-required even when they are written into draft artifacts for inspection.
Multi-schema projects
When a prompt references many schemas, pass them together:
contractforge-ai generate \
--prompt "Create a Supabase medallion project for products and product_movements. Use the same shared JDBC connection. Products use hash_diff_upsert; movements use append." \
--schemas schemas/products.json schemas/product_movements.json \
--with-ai \
--provider openai \
--output-dir generated/supabase-multi
The generator should use one shared connection YAML when the source connector is the same. Dataset-specific overrides stay in each ingestion contract.
Connection inheritance
Shared connection YAMLs centralize endpoint, auth and common read options:
# connections/supabase.yaml
source:
type: connector
connector: postgres
system: supabase
options:
url: "{{ secret:supabase/jdbc_url }}"
driver: org.postgresql.Driver
auth:
type: basic
username: "{{ secret:supabase/user }}"
password: "{{ secret:supabase/password }}"
read:
fetchsize: 20000
An ingestion contract can inherit and override only the dataset-specific fields:
source:
type: connection
connection_path: project://connections/supabase.yaml
table: public.products
read:
partition_column: product_id
lower_bound: 1
upper_bound: 1000000
num_partitions: 8
Ingestion-level values override the global connection. The core resolves the connection before adapters plan or execute the contract.
Project output
A generated project normally contains:
project.yaml
environments/
databricks.environment.yaml
aws.environment.yaml
connections/
source.yaml
contracts/
bronze/
b_products/
b_products.ingestion.yaml
b_products.annotations.yaml
b_products.operations.yaml
README.md
RUNBOOK.md
VALIDATION.md
DECISIONS.md
AI_REVIEW.html or PROJECT_REVIEW.html
Validate the folder after generation:
contractforge-ai validate-project-structure generated/supabase-multi \
--adapter databricks \
--adapter aws \
--format html > generated/supabase-multi/project_validation.html