When to use it

Operational databases

Relational extraction

Use JDBC for PostgreSQL, MySQL, SQL Server, Oracle and compatible managed services.

Incremental loads

Watermark columns

Combine source predicates, ContractForge watermarks and deterministic deduplication before merge.

Large tables

Partitioned reads

Use partition columns, bounds, partitions and fetch size to avoid single-threaded extraction.

Cloud databases

Network first

Validate routes, firewalls, security groups, drivers and credentials before tuning the contract.

Runtime requirements

RequirementDetails
JDBC driverThe matching database driver must be installed or available to Spark.
Network routeThe runtime must reach the database host and port.
CredentialsUse Databricks secrets or runtime-managed identity. Do not commit database credentials in contracts.
Source permissionsThe source user must be allowed to read the selected table/query and metadata needed by the driver.

Databricks serverless

JDBC on serverless is viable only when three layers are available at the same time: a compatible JDBC driver, a network route to the database endpoint and credentials that the Spark reader can use. ContractForge can build the connector contract and generate RDS IAM tokens, but it cannot bypass missing network routes or missing driver support.

ConcernWhat to checkTypical fix
DriverNo suitable driver, class not found or unsupported connector package.Use a runtime policy, installed library or cluster type that exposes the database driver.
NetworkDNS resolves but TCP connection times out, or the database rejects the runtime source.Configure public access rules, firewall allowlist, PrivateLink, VPC/VNet peering or the workspace network policy.
AuthenticationPassword/token works outside Spark but fails in the job.Validate secret names, token expiry, SSL options, IAM policy and database user grants.
Extraction shapeReads succeed but are slow or create merge conflicts.Add predicates, partitioning, fetch size, deterministic deduplication and key quality rules.

Supported aliases

ConnectorTypical driverNotes
jdbcUser-providedGeneric Spark JDBC reader.
postgresorg.postgresql.DriverPostgres, Supabase, Aurora PostgreSQL and RDS PostgreSQL.
mysqlMySQL Connector/JRequires matching runtime driver.
sqlserverMicrosoft JDBC DriverUse encrypted connection options as required.
oracleOracle JDBC DriverDriver licensing/distribution is usually handled by the platform.

Basic authentication

source:
  type: connector
  connector: postgres
  options:
    url: "{{ secret:crm/postgres_url }}"
    dbtable: public.orders
    driver: org.postgresql.Driver
  auth:
    type: basic
    username: "{{ secret:crm/postgres_user }}"
    password: "{{ secret:crm/postgres_password }}"
  read:
    fetchsize: 10000

target:
  catalog: main
  schema: bronze_crm
  table: b_orders_jdbc

layer: bronze
mode: scd0_append
schema_policy: additive_only

Incremental SCD1 pattern

For current-state tables, combine source watermarking, deterministic deduplication and key quality gates before MERGE.

source:
  type: connector
  connector: postgres
  options:
    url: "{{ secret:crm/postgres_url }}"
    dbtable: public.orders
    driver: org.postgresql.Driver
  auth:
    type: basic
    username: "{{ secret:crm/postgres_user }}"
    password: "{{ secret:crm/postgres_password }}"
  incremental:
    watermark_column: updated_at
  read:
    fetchsize: 10000
    predicate: "updated_at >= timestamp '2026-01-01'"

target:
  catalog: main
  schema: silver_sales
  table: s_orders

layer: silver
mode: scd1_upsert
merge_keys: [order_id]
watermark_columns: [updated_at]
transform:
  deduplicate:
    keys: [order_id]
    order_by: updated_at DESC NULLS LAST
quality_rules:
  not_null: [order_id]
  unique_key: [order_id]

Partitioned reads

Use JDBC partitioning for large tables. Partitioning only works well when the partition column is numeric/date-like, reasonably distributed and bounded correctly.

source:
  type: connector
  connector: postgres
  options:
    url: "{{ secret:crm/postgres_url }}"
    dbtable: public.order_items
    driver: org.postgresql.Driver
  auth:
    type: basic
    username: "{{ secret:crm/postgres_user }}"
    password: "{{ secret:crm/postgres_password }}"
  read:
    partition_column: order_item_id
    lower_bound: 1
    upper_bound: 50000000
    num_partitions: 16
    fetchsize: 20000
Bounds are not filters

Spark JDBC bounds define partition stride. They do not necessarily restrict the result set by themselves. Use predicates or subqueries when you need source-side filtering.

Query and pushdown pattern

Use query or a subquery-style dbtable when the extraction needs a reviewed predicate or a join that belongs to the source system. Keep heavy business transformation outside the connector.

source:
  type: connector
  connector: jdbc
  options:
    url: "{{ secret:erp/jdbc_url }}"
    query: |
      SELECT id, updated_at, status, total_amount
      FROM sales.orders
      WHERE updated_at >= TIMESTAMP '2026-01-01'
        AND status IN ('closed', 'cancelled')
    driver: org.postgresql.Driver
  auth:
    type: basic
    username: "{{ secret:erp/user }}"
    password: "{{ secret:erp/password }}"

SCD1 hash diff pattern

Use hash diff when the target should append only rows whose business content changed compared with the latest target state.

target:
  catalog: main
  schema: silver_sales
  table: s_product_prices

layer: silver
mode: scd1_hash_diff
hash_keys: [product_id]
dedup_order_expr: updated_at DESC NULLS LAST
quality_rules:
  not_null: [product_id]
  unique_key: [product_id]
Deterministic latest state

Hash diff depends on a reliable latest-state comparison. Use explicit dedup ordering when the source can contain multiple versions per key.

Amazon RDS/Aurora IAM

For RDS/Aurora PostgreSQL with IAM authentication, ContractForge can generate an auth token in the Python driver. The runtime still needs network reachability and a compatible JDBC driver.

source:
  type: connector
  connector: postgres
  options:
    host: mydb.abc123.us-east-1.rds.amazonaws.com
    port: 5432
    database: app
    dbtable: public.orders
    driver: org.postgresql.Driver
  auth:
    type: rds_iam
    username: "{{ secret:contractforge-aws/rds_username }}"
    region: us-east-1
    credential_provider: default_chain
    sslmode: require
Credential providerUse
explicitUse access key, secret key and optional session token from secrets.
envUse AWS credentials already available as environment variables.
default_chainUse botocore's default credential chain. Requires contractforge[aws] or runtime-provided botocore.

Network prerequisites

JDBC failures are often platform/network issues, not contract issues. Validate the path before tuning the contract.

  • The database hostname must resolve from the Databricks runtime.
  • Security groups, firewalls and database allowlists must permit the runtime source IP or private route.
  • Private databases usually need VPC peering, PrivateLink, VPN or same-cloud networking.
  • The JDBC driver must be installed or available in the runtime classpath.
  • Serverless runtimes may restrict custom drivers or network routes depending on workspace configuration.

Operational metadata

JDBC source metadata is redacted before persistence. Use it to verify connector, auth type, partitioning and source options without exposing credentials.

SELECT
  run_id,
  source_connector,
  source_provider,
  source_options_json,
  source_auth_redacted_json,
  source_read_redacted_json,
  source_metrics_json
FROM main.ops.ctrl_ingestion_runs
WHERE source_connector IN ('jdbc', 'postgres', 'mysql', 'sqlserver', 'oracle')
ORDER BY started_at_utc DESC;

Common issues

SymptomLikely causeAction
No suitable driverJDBC jar is not installed.Install the matching driver on cluster/serverless environment.
Connection timeoutNo network route to database.Fix VPC, peering, PrivateLink, firewall or public access rules.
Authentication failedWrong secret, expired password/token or IAM policy issue.Validate credentials outside ContractForge first.
Slow extractionSingle JDBC partition or no predicate pushdown.Add partitioning, fetch size and source predicates.
MERGE duplicate key errorSource has multiple records per key.Add transform.deduplicate and quality key rules.