Relational extraction
Use JDBC for PostgreSQL, MySQL, SQL Server, Oracle and compatible managed services.
Connector
Use JDBC for relational databases when the Spark runtime has the driver, network route and credentials required to read from the source. ContractForge builds the reader contract, records safe metadata and lets write modes handle target semantics.
Use JDBC for PostgreSQL, MySQL, SQL Server, Oracle and compatible managed services.
Combine source predicates, ContractForge watermarks and deterministic deduplication before merge.
Use partition columns, bounds, partitions and fetch size to avoid single-threaded extraction.
Validate routes, firewalls, security groups, drivers and credentials before tuning the contract.
| Requirement | Details |
|---|---|
| JDBC driver | The matching database driver must be installed or available to Spark. |
| Network route | The runtime must reach the database host and port. |
| Credentials | Use Databricks secrets or runtime-managed identity. Do not commit database credentials in contracts. |
| Source permissions | The source user must be allowed to read the selected table/query and metadata needed by the driver. |
JDBC on serverless is viable only when three layers are available at the same time: a compatible JDBC driver, a network route to the database endpoint and credentials that the Spark reader can use. ContractForge can build the connector contract and generate RDS IAM tokens, but it cannot bypass missing network routes or missing driver support.
| Concern | What to check | Typical fix |
|---|---|---|
| Driver | No suitable driver, class not found or unsupported connector package. | Use a runtime policy, installed library or cluster type that exposes the database driver. |
| Network | DNS resolves but TCP connection times out, or the database rejects the runtime source. | Configure public access rules, firewall allowlist, PrivateLink, VPC/VNet peering or the workspace network policy. |
| Authentication | Password/token works outside Spark but fails in the job. | Validate secret names, token expiry, SSL options, IAM policy and database user grants. |
| Extraction shape | Reads succeed but are slow or create merge conflicts. | Add predicates, partitioning, fetch size, deterministic deduplication and key quality rules. |
| Connector | Typical driver | Notes |
|---|---|---|
jdbc | User-provided | Generic Spark JDBC reader. |
postgres | org.postgresql.Driver | Postgres, Supabase, Aurora PostgreSQL and RDS PostgreSQL. |
mysql | MySQL Connector/J | Requires matching runtime driver. |
sqlserver | Microsoft JDBC Driver | Use encrypted connection options as required. |
oracle | Oracle JDBC Driver | Driver licensing/distribution is usually handled by the platform. |
source:
type: connector
connector: postgres
options:
url: "{{ secret:crm/postgres_url }}"
dbtable: public.orders
driver: org.postgresql.Driver
auth:
type: basic
username: "{{ secret:crm/postgres_user }}"
password: "{{ secret:crm/postgres_password }}"
read:
fetchsize: 10000
target:
catalog: main
schema: bronze_crm
table: b_orders_jdbc
layer: bronze
mode: scd0_append
schema_policy: additive_only
For current-state tables, combine source watermarking, deterministic deduplication and key quality gates before MERGE.
source:
type: connector
connector: postgres
options:
url: "{{ secret:crm/postgres_url }}"
dbtable: public.orders
driver: org.postgresql.Driver
auth:
type: basic
username: "{{ secret:crm/postgres_user }}"
password: "{{ secret:crm/postgres_password }}"
incremental:
watermark_column: updated_at
read:
fetchsize: 10000
predicate: "updated_at >= timestamp '2026-01-01'"
target:
catalog: main
schema: silver_sales
table: s_orders
layer: silver
mode: scd1_upsert
merge_keys: [order_id]
watermark_columns: [updated_at]
transform:
deduplicate:
keys: [order_id]
order_by: updated_at DESC NULLS LAST
quality_rules:
not_null: [order_id]
unique_key: [order_id]
Use JDBC partitioning for large tables. Partitioning only works well when the partition column is numeric/date-like, reasonably distributed and bounded correctly.
source:
type: connector
connector: postgres
options:
url: "{{ secret:crm/postgres_url }}"
dbtable: public.order_items
driver: org.postgresql.Driver
auth:
type: basic
username: "{{ secret:crm/postgres_user }}"
password: "{{ secret:crm/postgres_password }}"
read:
partition_column: order_item_id
lower_bound: 1
upper_bound: 50000000
num_partitions: 16
fetchsize: 20000
Spark JDBC bounds define partition stride. They do not necessarily restrict the result set by themselves. Use predicates or subqueries when you need source-side filtering.
Use query or a subquery-style dbtable when the extraction needs a reviewed predicate or a join that belongs to the source system. Keep heavy business transformation outside the connector.
source:
type: connector
connector: jdbc
options:
url: "{{ secret:erp/jdbc_url }}"
query: |
SELECT id, updated_at, status, total_amount
FROM sales.orders
WHERE updated_at >= TIMESTAMP '2026-01-01'
AND status IN ('closed', 'cancelled')
driver: org.postgresql.Driver
auth:
type: basic
username: "{{ secret:erp/user }}"
password: "{{ secret:erp/password }}"
Use hash diff when the target should append only rows whose business content changed compared with the latest target state.
target:
catalog: main
schema: silver_sales
table: s_product_prices
layer: silver
mode: scd1_hash_diff
hash_keys: [product_id]
dedup_order_expr: updated_at DESC NULLS LAST
quality_rules:
not_null: [product_id]
unique_key: [product_id]
Hash diff depends on a reliable latest-state comparison. Use explicit dedup ordering when the source can contain multiple versions per key.
For RDS/Aurora PostgreSQL with IAM authentication, ContractForge can generate an auth token in the Python driver. The runtime still needs network reachability and a compatible JDBC driver.
source:
type: connector
connector: postgres
options:
host: mydb.abc123.us-east-1.rds.amazonaws.com
port: 5432
database: app
dbtable: public.orders
driver: org.postgresql.Driver
auth:
type: rds_iam
username: "{{ secret:contractforge-aws/rds_username }}"
region: us-east-1
credential_provider: default_chain
sslmode: require
| Credential provider | Use |
|---|---|
explicit | Use access key, secret key and optional session token from secrets. |
env | Use AWS credentials already available as environment variables. |
default_chain | Use botocore's default credential chain. Requires contractforge[aws] or runtime-provided botocore. |
JDBC failures are often platform/network issues, not contract issues. Validate the path before tuning the contract.
JDBC source metadata is redacted before persistence. Use it to verify connector, auth type, partitioning and source options without exposing credentials.
SELECT
run_id,
source_connector,
source_provider,
source_options_json,
source_auth_redacted_json,
source_read_redacted_json,
source_metrics_json
FROM main.ops.ctrl_ingestion_runs
WHERE source_connector IN ('jdbc', 'postgres', 'mysql', 'sqlserver', 'oracle')
ORDER BY started_at_utc DESC;
| Symptom | Likely cause | Action |
|---|---|---|
| No suitable driver | JDBC jar is not installed. | Install the matching driver on cluster/serverless environment. |
| Connection timeout | No network route to database. | Fix VPC, peering, PrivateLink, firewall or public access rules. |
| Authentication failed | Wrong secret, expired password/token or IAM policy issue. | Validate credentials outside ContractForge first. |
| Slow extraction | Single JDBC partition or no predicate pushdown. | Add partitioning, fetch size and source predicates. |
| MERGE duplicate key error | Source has multiple records per key. | Add transform.deduplicate and quality key rules. |