Static diagnostics

contractforge connectors list
contractforge connectors show rest_api jdbc s3 azure_blob
contractforge connectors doctor rest_api jdbc s3 azure_blob

connectors doctor does not open network connections. It documents expected runtime capabilities so reviewers can catch missing drivers, cloud access or platform constraints early.

Connector matrix

Use this matrix to choose the first connector to test. The detailed page for each connector contains contract examples, runtime constraints and operational notes.

Source typeConnectorsBest fitCommon follow-up
Catalog datatable, delta_table, view, sqlData already registered in Spark or Unity Catalog.Use write modes and quality gates directly.
Filescsv, json, jsonl, ndjson, parquet, delta, orc, avro, xml, textStatic folders, backfills and finite file sets reachable by Spark.Declare schema, use folder filters and apply transform.shape when nested.
HTTP fileshttp_file, http_csv, http_json, http_textBounded public/authenticated files when Spark cannot read HTTPS directly.Protect with byte/time limits; move high-volume feeds to storage.
Object storageobject_storage, blob, s3, azure_blobS3, ADLS, Blob or governed storage paths.Choose External Location/Volume for serverless or direct credentials on classic clusters.
Relational databasesjdbc, postgres, mysql, sqlserver, oracleDatabase extraction with partitioning, predicates and watermarks.Install drivers, validate network, deduplicate before MERGE.
APIsrest_apiBounded API extraction, pagination and raw payload capture.Use transform.shape.parse_json for nested documents.
File streamsautoloaderDatabricks cloudFiles available_now with checkpoints.Inspect ctrl_ingestion_streams and child run metrics.
External systemssnowflake, bigquerySources supported by installed Spark connector packages.Keep provider credentials outside contracts and validate installed libraries.

Connector responsibility boundary

A connector is responsible for retrieving bytes or records and exposing safe source metadata. It should not contain business transformations. Use transform.shape, column_mapping, quality rules and write modes for the contract-specific behavior.

Connector

Read the source

URL, path, table, SQL, API request, pagination, credentials, driver options and runtime-specific access setup.

Transform

Shape the data

Parse JSON strings, flatten structs, explode arrays, zip parallel arrays, project fields, cast columns and deduplicate.

Quality

Validate intent

Required keys, accepted values, expressions, uniqueness and row-level quarantine decisions.

Writer

Commit semantics

Append, overwrite, SCD1, hash diff, SCD2 or snapshot soft delete with metrics and control-table evidence.

Runtime compatibility guide

Serverless is not a weaker runtime, but it is more opinionated. Prefer governed access paths and platform-managed connectivity before adding connector-level credentials.

ConnectorServerless preferenceClassic/local preferenceWhat to validate first
Files on workspace/VolumesUse Volumes, Workspace files or External Locations with explicit schema.Use any Spark-readable path and installed format libraries.Path grants, schema, recursive lookup and format dependencies.
S3Use Unity Catalog External Locations or Volumes. Direct S3A credentials can be blocked by Spark Connect controls.Use S3A credentials, instance profile, assumed role or the AWS default chain.List/read grants, path prefix, session token expiry and Hadoop AWS libraries.
Azure Blob / ADLSUse External Locations or Volumes. Direct SAS/Hadoop configuration is not the durable serverless path.Use SAS, service principal, managed identity or ABFS configuration when allowed.Storage credential validation, list permission and endpoint reachability.
HTTP fileUse for bounded files when Spark cannot read HTTPS directly.Same pattern; classic networking is usually easier to customize.Driver egress, file size, timeout, retries and response format.
REST APIDriver-side HTTP. Keep page, byte, record and timeout limits explicit.Same contract model; cluster networking may allow more custom routing.DNS, proxy, allowlists, API rate limits and payload shape.
JDBC / RDS IAMRequires driver availability, database route and, for IAM, AWS credentials visible to the Python driver.Install driver and configure network, security groups, peering or PrivateLink.TCP route, driver class, SSL options, IAM policy and source query bounds.
SnowflakeRequires connector/JDBC availability and Snowflake network policy that allows the Databricks egress path.Use connector/JDBC packages and validate credentials from the cluster.Service user, role, warehouse, PAT/JWT support and network policy.
BigQueryPrefer Unity Catalog Lakehouse Federation when available; use the direct Spark connector only when dependencies and credentials are supported.Use the Spark BigQuery connector with service-account/runtime credentials.Federated connection, materialization dataset, service-account permissions and cost controls.
Auto LoaderUse supported cloudFiles paths through Volumes or External Locations.Use cluster/cloud IAM and checkpoint/schema locations.Checkpoint path, schema location, source access and child-run stream metrics.

Serverless decision path

1

Use governed paths first

For object storage, start with Unity Catalog External Locations or Volumes. They make permissions auditable and avoid runtime-specific Hadoop credential mutation.

2

Use federation when available

For BigQuery and other catalog-integrated sources, a federated table can be consumed through table or sql, which avoids connector packaging and credential-file handling in jobs.

3

Use direct connectors deliberately

Direct JDBC, Snowflake and BigQuery connectors remain useful when dependencies, credentials and network routes are explicitly supported by the runtime.

4

Fail clearly

If a platform blocks network, driver or filesystem configuration, fix the platform capability. Do not hide the issue behind runtime-specific behavior.

Common connector fields

Most connectors use the same top-level shape. Individual connector pages document connector-specific options, but these fields are the common vocabulary.

FieldUse
typeUse connector for registry-based connectors.
connectorConnector id, such as csv, s3, azure_blob, jdbc, rest_api or autoloader.
nameLogical source name for observability.
providerCloud or source provider, when useful for operations.
formatData format for file-like connectors.
pathFile, folder, object-storage or URL path.
account_url, containerObject-storage location parts for Azure Blob/ADLS-style sources.
table, queryTable or SQL query for catalog/JDBC sources.
optionsSpark/DataSource connector options and runtime-specific configuration.
readRead semantics such as schema, multiline, recursive lookup, partitioning, bounds or fetch size.
requestHTTP method, URL, headers and request parameters.
authAuthentication metadata. Secrets should use {{ secret:scope/key }} references.
paginationREST pagination strategy.
responseREST response extraction and record path handling.
incrementalConnector-level incremental settings.
limitsTimeout, retry and payload safety limits supported by the connector.

Source metadata

Every connector contributes source metadata to the run payload and ctrl_ingestion_runs. Secrets are redacted before persistence.

SELECT
  source_connector,
  source_format,
  source_path,
  source_options_json,
  source_read_json,
  source_auth_json,
  source_metrics_json
FROM main.ops.ctrl_ingestion_runs
ORDER BY started_at_utc DESC
LIMIT 20;