Skip to main content

JDBC

validated

Use JDBC for relational databases when the Spark runtime has the driver, network route and credentials required to read from the source. ContractForge builds the reader contract, records safe metadata and lets write modes handle target semantics.

When to use it

Operational databases

Relational extraction

Use JDBC for PostgreSQL, MySQL, SQL Server, Oracle and compatible managed services.

Incremental loads

Watermark columns

Combine source predicates, ContractForge watermarks and deterministic deduplication before merge.

Large tables

Partitioned reads

Use partition columns, bounds, partitions and fetch size to avoid single-threaded extraction.

Cloud databases

Network first

Validate routes, firewalls, security groups, drivers and credentials before tuning the contract.

Runtime requirements

RequirementDetails
JDBC driverThe matching database driver must be installed or available to Spark.
Network routeThe runtime must reach the database host and port.
CredentialsUse Databricks secrets or runtime-managed identity. Do not commit database credentials in contracts.
Source permissionsThe source user must be allowed to read the selected table/query and metadata needed by the driver.

Databricks serverless

JDBC on serverless is viable only when three layers are available at the same time: a compatible JDBC driver, a network route to the database endpoint and credentials that the Spark reader can use. ContractForge can build the connector contract and generate RDS IAM tokens, but it cannot bypass missing network routes or missing driver support.

ConcernWhat to checkTypical fix
DriverNo suitable driver, class not found or unsupported connector package.Use a runtime policy, installed library or cluster type that exposes the database driver.
NetworkDNS resolves but TCP connection times out, or the database rejects the runtime source.Configure public access rules, firewall allowlist, PrivateLink, VPC/VNet peering or the workspace network policy.
AuthenticationPassword/token works outside Spark but fails in the job.Validate secret names, token expiry, SSL options, IAM policy and database user grants.
Extraction shapeReads succeed but are slow or create merge conflicts.Add predicates, partitioning, fetch size, deterministic deduplication and key quality rules.

Supported aliases

ConnectorTypical driverNotes
jdbcUser-providedGeneric Spark JDBC reader.
postgresorg.postgresql.DriverPostgres, Supabase, Aurora PostgreSQL and RDS PostgreSQL.
mysqlMySQL Connector/JRequires matching runtime driver.
sqlserverMicrosoft JDBC DriverUse encrypted connection options as required.
oracleOracle JDBC DriverDriver licensing/distribution is usually handled by the platform.

Basic authentication

source:
type: connector
connector: postgres
options:
url: "{{ secret:crm/postgres_url }}"
dbtable: public.orders
driver: org.postgresql.Driver
auth:
type: basic
username: "{{ secret:crm/postgres_user }}"
password: "{{ secret:crm/postgres_password }}"
read:
fetchsize: 10000

target:
catalog: main
schema: bronze_crm
table: b_orders_jdbc

layer: bronze
mode: append
schema_policy: additive_only

Incremental current-state pattern

For current-state tables, combine source watermarking, deterministic deduplication and key quality gates before MERGE.

source:
type: connector
connector: postgres
options:
url: "{{ secret:crm/postgres_url }}"
dbtable: public.orders
driver: org.postgresql.Driver
auth:
type: basic
username: "{{ secret:crm/postgres_user }}"
password: "{{ secret:crm/postgres_password }}"
incremental:
watermark_column: updated_at
read:
fetchsize: 10000
predicate: "updated_at >= timestamp '2026-01-01'"

target:
catalog: main
schema: silver_sales
table: s_orders

layer: silver
mode: upsert
merge_keys: [order_id]
watermark_columns: [updated_at]
transform:
deduplicate:
keys: [order_id]
order_by: updated_at DESC NULLS LAST
quality_rules:
not_null: [order_id]
unique_key: [order_id]

Partitioned reads

Use JDBC partitioning for large tables. Partitioning only works well when the partition column is numeric/date-like, reasonably distributed and bounded correctly.

source:
type: connector
connector: postgres
options:
url: "{{ secret:crm/postgres_url }}"
dbtable: public.order_items
driver: org.postgresql.Driver
auth:
type: basic
username: "{{ secret:crm/postgres_user }}"
password: "{{ secret:crm/postgres_password }}"
read:
partition_column: order_item_id
lower_bound: 1
upper_bound: 50000000
num_partitions: 16
fetchsize: 20000

Bounds are not filters

Spark JDBC bounds define partition stride. They do not necessarily restrict the result set by themselves. Use predicates or subqueries when you need source-side filtering.

Query and pushdown pattern

Use query or a subquery-style dbtable when the extraction needs a reviewed predicate or a join that belongs to the source system. Keep heavy business transformation outside the connector.

source:
type: connector
connector: jdbc
options:
url: "{{ secret:erp/jdbc_url }}"
query: |
SELECT id, updated_at, status, total_amount
FROM sales.orders
WHERE updated_at >= TIMESTAMP '2026-01-01'
AND status IN ('closed', 'cancelled')
driver: org.postgresql.Driver
auth:
type: basic
username: "{{ secret:erp/user }}"
password: "{{ secret:erp/password }}"

hash-diff upsert pattern

Use hash diff when the target should append only rows whose business content changed compared with the latest target state.

target:
catalog: main
schema: silver_sales
table: s_product_prices

layer: silver
mode: hash_diff_upsert
merge_keys: [product_id]
hash_keys: [sku, name, brand, price_amount]
dedup_order_expr: updated_at DESC NULLS LAST
quality_rules:
not_null: [product_id]
unique_key: [product_id]

Deterministic latest state

Hash diff depends on a reliable latest-state comparison. Use explicit dedup ordering when the source can contain multiple versions per key.

Amazon RDS/Aurora IAM

For RDS/Aurora PostgreSQL with IAM authentication, ContractForge can generate an auth token in the Python driver. The runtime still needs network reachability and a compatible JDBC driver.

source:
type: connector
connector: postgres
options:
host: mydb.abc123.us-east-1.rds.amazonaws.com
port: 5432
database: app
dbtable: public.orders
driver: org.postgresql.Driver
auth:
type: rds_iam
username: "{{ secret:contractforge-aws/rds_username }}"
region: us-east-1
credential_provider: default_chain
sslmode: require
Credential providerUse
explicitUse access key, secret key and optional session token from secrets.
envUse AWS credentials already available as environment variables.
default_chainUse botocore's default credential chain. Requires contractforge[aws] or runtime-provided botocore.

Network prerequisites

JDBC failures are often platform/network issues, not contract issues. Validate the path before tuning the contract.

  • The database hostname must resolve from the Databricks runtime.
  • Security groups, firewalls and database allowlists must permit the runtime source IP or private route.
  • Private databases usually need VPC peering, PrivateLink, VPN or same-cloud networking.
  • The JDBC driver must be installed or available in the runtime classpath.
  • Serverless runtimes may restrict custom drivers or network routes depending on workspace configuration.

Operational metadata

JDBC source metadata is redacted before persistence. Use it to verify connector, auth type, partitioning and source options without exposing credentials.

SELECT
run_id,
source_connector,
source_provider,
source_options_json,
source_auth_redacted_json,
source_read_redacted_json,
source_metrics_json
FROM main.ops.ctrl_ingestion_runs
WHERE source_connector IN ('jdbc', 'postgres', 'mysql', 'sqlserver', 'oracle')
ORDER BY started_at_utc DESC;

Common issues

SymptomLikely causeAction
No suitable driverJDBC jar is not installed.Install the matching driver on cluster/serverless environment.
Connection timeoutNo network route to database.Fix VPC, peering, PrivateLink, firewall or public access rules.
Authentication failedWrong secret, expired password/token or IAM policy issue.Validate credentials outside ContractForge first.
Slow extractionSingle JDBC partition or no predicate pushdown.Add partitioning, fetch size and source predicates.
MERGE duplicate key errorSource has multiple records per key.Add transform.deduplicate and quality key rules.