Connectors | ContractForge

Table and SQLSpark catalog tables, views and SQL queries. FilesParquet, Delta path, JSON, CSV, ORC, Avro, XML and text. HTTP fileDownload public files through Python when Spark cannot read HTTPS directly. Object storageProvider-neutral object storage configuration. S3External Location on serverless, S3A credentials on classic/local runtimes. Azure BlobSAS on classic runtimes, External Location/Volume on serverless. JDBCDatabases with partitioning, pushdown and RDS IAM support. REST APIPaginated APIs, raw payloads and record extraction. Auto LoaderDatabricks available-now ingestion with checkpoints. SnowflakeRead tables and queries through the Spark Snowflake connector. BigQueryRead tables and queries through the Spark BigQuery connector.

Static diagnostics

contractforge connectors list
contractforge connectors show rest_api jdbc s3 azure_blob
contractforge connectors doctor rest_api jdbc s3 azure_blob

connectors doctor does not open network connections. It documents expected runtime capabilities so reviewers can catch missing drivers, cloud access or platform constraints early.

Connector matrix

Use this matrix to choose the first connector to test. The detailed page for each connector contains contract examples, runtime constraints and operational notes.

Source type	Connectors	Best fit	Common follow-up
Catalog data	`table`, `delta_table`, `view`, `sql`	Data already registered in Spark or Unity Catalog.	Use write modes and quality gates directly.
Files	`csv`, `json`, `jsonl`, `ndjson`, `parquet`, `delta`, `orc`, `avro`, `xml`, `text`	Static folders, backfills and finite file sets reachable by Spark.	Declare schema, use folder filters and apply `transform.shape` when nested.
HTTP files	`http_file`, `http_csv`, `http_json`, `http_text`	Bounded public/authenticated files when Spark cannot read HTTPS directly.	Protect with byte/time limits; move high-volume feeds to storage.
Object storage	`object_storage`, `blob`, `s3`, `azure_blob`	S3, ADLS, Blob or governed storage paths.	Choose External Location/Volume for serverless or direct credentials on classic clusters.
Relational databases	`jdbc`, `postgres`, `mysql`, `sqlserver`, `oracle`	Database extraction with partitioning, predicates and watermarks.	Install drivers, validate network, deduplicate before MERGE.
APIs	`rest_api`	Bounded API extraction, pagination and raw payload capture.	Use `transform.shape.parse_json` for nested documents.
File streams	`autoloader`	Databricks cloudFiles `available_now` with checkpoints.	Inspect `ctrl_ingestion_streams` and child run metrics.
External systems	`snowflake`, `bigquery`	Sources supported by installed Spark connector packages.	Keep provider credentials outside contracts and validate installed libraries.

Connector responsibility boundary

A connector is responsible for retrieving bytes or records and exposing safe source metadata. It should not contain business transformations. Use transform.shape, column_mapping, quality rules and write modes for the contract-specific behavior.

Connector

Read the source

URL, path, table, SQL, API request, pagination, credentials, driver options and runtime-specific access setup.

Transform

Shape the data

Parse JSON strings, flatten structs, explode arrays, zip parallel arrays, project fields, cast columns and deduplicate.

Quality

Validate intent

Required keys, accepted values, expressions, uniqueness and row-level quarantine decisions.

Writer

Commit semantics

Append, overwrite, SCD1, hash diff, SCD2 or snapshot soft delete with metrics and control-table evidence.

Runtime compatibility guide

Serverless is not a weaker runtime, but it is more opinionated. Prefer governed access paths and platform-managed connectivity before adding connector-level credentials.

Connector	Serverless preference	Classic/local preference	What to validate first
Files on workspace/Volumes	Use Volumes, Workspace files or External Locations with explicit schema.	Use any Spark-readable path and installed format libraries.	Path grants, schema, recursive lookup and format dependencies.
S3	Use Unity Catalog External Locations or Volumes. Direct S3A credentials can be blocked by Spark Connect controls.	Use S3A credentials, instance profile, assumed role or the AWS default chain.	List/read grants, path prefix, session token expiry and Hadoop AWS libraries.
Azure Blob / ADLS	Use External Locations or Volumes. Direct SAS/Hadoop configuration is not the durable serverless path.	Use SAS, service principal, managed identity or ABFS configuration when allowed.	Storage credential validation, list permission and endpoint reachability.
HTTP file	Use for bounded files when Spark cannot read HTTPS directly.	Same pattern; classic networking is usually easier to customize.	Driver egress, file size, timeout, retries and response format.
REST API	Driver-side HTTP. Keep page, byte, record and timeout limits explicit.	Same contract model; cluster networking may allow more custom routing.	DNS, proxy, allowlists, API rate limits and payload shape.
JDBC / RDS IAM	Requires driver availability, database route and, for IAM, AWS credentials visible to the Python driver.	Install driver and configure network, security groups, peering or PrivateLink.	TCP route, driver class, SSL options, IAM policy and source query bounds.
Snowflake	Requires connector/JDBC availability and Snowflake network policy that allows the Databricks egress path.	Use connector/JDBC packages and validate credentials from the cluster.	Service user, role, warehouse, PAT/JWT support and network policy.
BigQuery	Prefer Unity Catalog Lakehouse Federation when available; use the direct Spark connector only when dependencies and credentials are supported.	Use the Spark BigQuery connector with service-account/runtime credentials.	Federated connection, materialization dataset, service-account permissions and cost controls.
Auto Loader	Use supported `cloudFiles` paths through Volumes or External Locations.	Use cluster/cloud IAM and checkpoint/schema locations.	Checkpoint path, schema location, source access and child-run stream metrics.

Serverless decision path

Use governed paths first

For object storage, start with Unity Catalog External Locations or Volumes. They make permissions auditable and avoid runtime-specific Hadoop credential mutation.

Use federation when available

For BigQuery and other catalog-integrated sources, a federated table can be consumed through table or sql, which avoids connector packaging and credential-file handling in jobs.

Use direct connectors deliberately

Direct JDBC, Snowflake and BigQuery connectors remain useful when dependencies, credentials and network routes are explicitly supported by the runtime.

Fail clearly

If a platform blocks network, driver or filesystem configuration, fix the platform capability. Do not hide the issue behind runtime-specific behavior.

Common connector fields

Most connectors use the same top-level shape. Individual connector pages document connector-specific options, but these fields are the common vocabulary.

Field	Use
`type`	Use `connector` for registry-based connectors.
`connector`	Connector id, such as `csv`, `s3`, `azure_blob`, `jdbc`, `rest_api` or `autoloader`.
`name`	Logical source name for observability.
`provider`	Cloud or source provider, when useful for operations.
`format`	Data format for file-like connectors.
`path`	File, folder, object-storage or URL path.
`account_url`, `container`	Object-storage location parts for Azure Blob/ADLS-style sources.
`table`, `query`	Table or SQL query for catalog/JDBC sources.
`options`	Spark/DataSource connector options and runtime-specific configuration.
`read`	Read semantics such as schema, multiline, recursive lookup, partitioning, bounds or fetch size.
`request`	HTTP method, URL, headers and request parameters.
`auth`	Authentication metadata. Secrets should use `{{ secret:scope/key }}` references.
`pagination`	REST pagination strategy.
`response`	REST response extraction and record path handling.
`incremental`	Connector-level incremental settings.
`limits`	Timeout, retry and payload safety limits supported by the connector.

Source metadata

Every connector contributes source metadata to the run payload and ctrl_ingestion_runs. Secrets are redacted before persistence.

SELECT
  source_connector,
  source_format,
  source_path,
  source_options_json,
  source_read_json,
  source_auth_json,
  source_metrics_json
FROM main.ops.ctrl_ingestion_runs
ORDER BY started_at_utc DESC
LIMIT 20;