S3

validated

Use the S3 connector for s3a:// paths when ContractForge should declare the source path and format. The runtime still determines how credentials and network access are resolved.

When to use it

AWS serverless

External Location or Volume

Use Unity Catalog governed storage when the Databricks workspace runs on AWS and serverless compute should not receive direct S3A credentials.

Cross-cloud serverless

External Location or network policy

Use a governed path when the workspace is not running in AWS or when direct filesystem credentials are blocked by platform policy.

Classic cluster

Direct S3A access

Use scoped access keys, temporary session tokens, instance profiles or Hadoop AWS configuration when the cluster runtime allows direct S3A setup.

Connector role

Path, format and evidence

Keep object access explicit in source; use transform and quality rules for business normalization after the files are read.

Runtime guidance

Runtime	Recommended access path
Databricks serverless on AWS	Unity Catalog external location or volume backed by S3.
Databricks serverless outside AWS	External location or network policy if supported by the workspace.
Classic cluster	Direct `s3a://` access with Hadoop AWS configuration, instance profile or scoped access keys.

Serverless is stricter by design. If direct credentials are not available to the Spark runtime, use an external location and point the connector at the volume or external path.

Parquet folder

YAML
Python

source:
  type: connector
  connector: s3
  path: s3a://company-landing/orders/
  format: parquet
  options:
    recursiveFileLookup: true

source = {
    "type": "connector",
    "connector": "s3",
    "path": "s3a://company-landing/orders/",
    "format": "parquet",
    "options": {"recursiveFileLookup": True},
}

Direct credentials on classic clusters

YAML
Python

source:
  type: connector
  connector: s3
  path: s3a://company-landing/orders/
  format: json
  auth:
    type: access_key
    access_key_id: "{{ secret:contractforge-aws/aws_access_key_id }}"
    secret_access_key: "{{ secret:contractforge-aws/aws_secret_access_key }}"
    session_token: "{{ secret:contractforge-aws/aws_session_token }}"

source = {
    "type": "connector",
    "connector": "s3",
    "path": "s3a://company-landing/orders/",
    "format": "json",
    "auth": {
        "type": "access_key",
        "access_key_id": "{{ secret:contractforge-aws/aws_access_key_id }}",
        "secret_access_key": "{{ secret:contractforge-aws/aws_secret_access_key }}",
        "session_token": "{{ secret:contractforge-aws/aws_session_token }}",
    },
}

session_token is required when the AWS credentials are temporary. Keep tokens short-lived and store all values in Databricks secrets.

Options

Option	Purpose
`recursiveFileLookup`	Reads nested folders.
`pathGlobFilter`	Filters file names using Spark glob syntax.
`mergeSchema`	Enables schema merging for compatible formats.
`header`, `delimiter`, `mode`	CSV parsing options.

Use filter_expression for row filtering after read. Use connector options for reader-level behavior only.

When to use it​

External Location or Volume​

External Location or network policy​

Direct S3A access​

Path, format and evidence​

Runtime guidance​

Parquet folder​

Direct credentials on classic clusters​

Options​