JDBC

validated

Use JDBC for relational databases when the Spark runtime has the driver, network route and credentials required to read from the source. ContractForge builds the reader contract, records safe metadata and lets write modes handle target semantics.

When to use it

Operational databases

Relational extraction

Use JDBC for PostgreSQL, MySQL, SQL Server, Oracle and compatible managed services.

Incremental loads

Watermark columns

Combine source predicates, ContractForge watermarks and deterministic deduplication before merge.

Large tables

Partitioned reads

Use partition columns, bounds, partitions and fetch size to avoid single-threaded extraction.

Cloud databases

Network first

Validate routes, firewalls, security groups, drivers and credentials before tuning the contract.

Runtime requirements

Requirement	Details
JDBC driver	The matching database driver must be installed or available to Spark.
Network route	The runtime must reach the database host and port.
Credentials	Use Databricks secrets or runtime-managed identity. Do not commit database credentials in contracts.
Source permissions	The source user must be allowed to read the selected table/query and metadata needed by the driver.

Databricks serverless

JDBC on serverless is viable only when three layers are available at the same time: a compatible JDBC driver, a network route to the database endpoint and credentials that the Spark reader can use. ContractForge can build the connector contract and generate RDS IAM tokens, but it cannot bypass missing network routes or missing driver support.

Concern	What to check	Typical fix
Driver	`No suitable driver`, class not found or unsupported connector package.	Use a runtime policy, installed library or cluster type that exposes the database driver.
Network	DNS resolves but TCP connection times out, or the database rejects the runtime source.	Configure public access rules, firewall allowlist, PrivateLink, VPC/VNet peering or the workspace network policy.
Authentication	Password/token works outside Spark but fails in the job.	Validate secret names, token expiry, SSL options, IAM policy and database user grants.
Extraction shape	Reads succeed but are slow or create merge conflicts.	Add predicates, partitioning, fetch size, deterministic deduplication and key quality rules.

Supported aliases

Connector	Typical driver	Notes
`jdbc`	User-provided	Generic Spark JDBC reader.
`postgres`	`org.postgresql.Driver`	Postgres, Supabase, Aurora PostgreSQL and RDS PostgreSQL.
`mysql`	MySQL Connector/J	Requires matching runtime driver.
`sqlserver`	Microsoft JDBC Driver	Use encrypted connection options as required.
`oracle`	Oracle JDBC Driver	Driver licensing/distribution is usually handled by the platform.

Basic authentication

source:
  type: connector
  connector: postgres
  options:
    url: "{{ secret:crm/postgres_url }}"
    dbtable: public.orders
    driver: org.postgresql.Driver
  auth:
    type: basic
    username: "{{ secret:crm/postgres_user }}"
    password: "{{ secret:crm/postgres_password }}"
  read:
    fetchsize: 10000

target:
  catalog: main
  schema: bronze_crm
  table: b_orders_jdbc

layer: bronze
mode: append
schema_policy: additive_only

Incremental current-state pattern

For current-state tables, combine source watermarking, deterministic deduplication and key quality gates before MERGE.

source:
  type: connector
  connector: postgres
  options:
    url: "{{ secret:crm/postgres_url }}"
    dbtable: public.orders
    driver: org.postgresql.Driver
  auth:
    type: basic
    username: "{{ secret:crm/postgres_user }}"
    password: "{{ secret:crm/postgres_password }}"
  incremental:
    watermark_column: updated_at
  read:
    fetchsize: 10000
    predicate: "updated_at >= timestamp '2026-01-01'"

target:
  catalog: main
  schema: silver_sales
  table: s_orders

layer: silver
mode: upsert
merge_keys: [order_id]
watermark_columns: [updated_at]
transform:
  deduplicate:
    keys: [order_id]
    order_by: updated_at DESC NULLS LAST
quality_rules:
  not_null: [order_id]
  unique_key: [order_id]

Partitioned reads

Use JDBC partitioning for large tables. Partitioning only works well when the partition column is numeric/date-like, reasonably distributed and bounded correctly.

source:
  type: connector
  connector: postgres
  options:
    url: "{{ secret:crm/postgres_url }}"
    dbtable: public.order_items
    driver: org.postgresql.Driver
  auth:
    type: basic
    username: "{{ secret:crm/postgres_user }}"
    password: "{{ secret:crm/postgres_password }}"
  read:
    partition_column: order_item_id
    lower_bound: 1
    upper_bound: 50000000
    num_partitions: 16
    fetchsize: 20000

Bounds are not filters

Spark JDBC bounds define partition stride. They do not necessarily restrict the result set by themselves. Use predicates or subqueries when you need source-side filtering.

Query and pushdown pattern

Use query or a subquery-style dbtable when the extraction needs a reviewed predicate or a join that belongs to the source system. Keep heavy business transformation outside the connector.

source:
  type: connector
  connector: jdbc
  options:
    url: "{{ secret:erp/jdbc_url }}"
    query: |
      SELECT id, updated_at, status, total_amount
      FROM sales.orders
      WHERE updated_at >= TIMESTAMP '2026-01-01'
        AND status IN ('closed', 'cancelled')
    driver: org.postgresql.Driver
  auth:
    type: basic
    username: "{{ secret:erp/user }}"
    password: "{{ secret:erp/password }}"

hash-diff upsert pattern

Use hash diff when the target should append only rows whose business content changed compared with the latest target state.

target:
  catalog: main
  schema: silver_sales
  table: s_product_prices

layer: silver
mode: hash_diff_upsert
merge_keys: [product_id]
hash_keys: [sku, name, brand, price_amount]
dedup_order_expr: updated_at DESC NULLS LAST
quality_rules:
  not_null: [product_id]
  unique_key: [product_id]

Deterministic latest state

Hash diff depends on a reliable latest-state comparison. Use explicit dedup ordering when the source can contain multiple versions per key.

Amazon RDS/Aurora IAM

For RDS/Aurora PostgreSQL with IAM authentication, ContractForge can generate an auth token in the Python driver. The runtime still needs network reachability and a compatible JDBC driver.

source:
  type: connector
  connector: postgres
  options:
    host: mydb.abc123.us-east-1.rds.amazonaws.com
    port: 5432
    database: app
    dbtable: public.orders
    driver: org.postgresql.Driver
  auth:
    type: rds_iam
    username: "{{ secret:contractforge-aws/rds_username }}"
    region: us-east-1
    credential_provider: default_chain
    sslmode: require

Credential provider	Use
`explicit`	Use access key, secret key and optional session token from secrets.
`env`	Use AWS credentials already available as environment variables.
`default_chain`	Use botocore's default credential chain. Requires `contractforge[aws]` or runtime-provided `botocore`.

Network prerequisites

JDBC failures are often platform/network issues, not contract issues. Validate the path before tuning the contract.

The database hostname must resolve from the Databricks runtime.
Security groups, firewalls and database allowlists must permit the runtime source IP or private route.
Private databases usually need VPC peering, PrivateLink, VPN or same-cloud networking.
The JDBC driver must be installed or available in the runtime classpath.
Serverless runtimes may restrict custom drivers or network routes depending on workspace configuration.

Operational metadata

JDBC source metadata is redacted before persistence. Use it to verify connector, auth type, partitioning and source options without exposing credentials.

SELECT
  run_id,
  source_connector,
  source_provider,
  source_options_json,
  source_auth_redacted_json,
  source_read_redacted_json,
  source_metrics_json
FROM main.ops.ctrl_ingestion_runs
WHERE source_connector IN ('jdbc', 'postgres', 'mysql', 'sqlserver', 'oracle')
ORDER BY started_at_utc DESC;

Common issues

Symptom	Likely cause	Action
No suitable driver	JDBC jar is not installed.	Install the matching driver on cluster/serverless environment.
Connection timeout	No network route to database.	Fix VPC, peering, PrivateLink, firewall or public access rules.
Authentication failed	Wrong secret, expired password/token or IAM policy issue.	Validate credentials outside ContractForge first.
Slow extraction	Single JDBC partition or no predicate pushdown.	Add partitioning, fetch size and source predicates.
MERGE duplicate key error	Source has multiple records per key.	Add `transform.deduplicate` and quality key rules.

When to use it​

Relational extraction​

Watermark columns​

Partitioned reads​

Network first​

Runtime requirements​

Databricks serverless​

Supported aliases​

Basic authentication​

Incremental current-state pattern​

Partitioned reads​

Query and pushdown pattern​

hash-diff upsert pattern​

Amazon RDS/Aurora IAM​

Network prerequisites​

Operational metadata​

Common issues​

When to use it

Relational extraction

Watermark columns

Partitioned reads

Network first

Runtime requirements

Databricks serverless

Supported aliases

Basic authentication

Incremental current-state pattern

Partitioned reads

Query and pushdown pattern

hash-diff upsert pattern

Amazon RDS/Aurora IAM

Network prerequisites

Operational metadata

Common issues