Skip to main content

S3

validated

Use the S3 connector for s3a:// paths when ContractForge should declare the source path and format. The runtime still determines how credentials and network access are resolved.

When to use it

AWS serverless

External Location or Volume

Use Unity Catalog governed storage when the Databricks workspace runs on AWS and serverless compute should not receive direct S3A credentials.

Cross-cloud serverless

External Location or network policy

Use a governed path when the workspace is not running in AWS or when direct filesystem credentials are blocked by platform policy.

Classic cluster

Direct S3A access

Use scoped access keys, temporary session tokens, instance profiles or Hadoop AWS configuration when the cluster runtime allows direct S3A setup.

Connector role

Path, format and evidence

Keep object access explicit in source; use transform and quality rules for business normalization after the files are read.

Runtime guidance

RuntimeRecommended access path
Databricks serverless on AWSUnity Catalog external location or volume backed by S3.
Databricks serverless outside AWSExternal location or network policy if supported by the workspace.
Classic clusterDirect s3a:// access with Hadoop AWS configuration, instance profile or scoped access keys.

Serverless is stricter by design. If direct credentials are not available to the Spark runtime, use an external location and point the connector at the volume or external path.

Parquet folder

source:
type: connector
connector: s3
path: s3a://company-landing/orders/
format: parquet
options:
recursiveFileLookup: true

Direct credentials on classic clusters

source:
type: connector
connector: s3
path: s3a://company-landing/orders/
format: json
auth:
type: access_key
access_key_id: "{{ secret:contractforge-aws/aws_access_key_id }}"
secret_access_key: "{{ secret:contractforge-aws/aws_secret_access_key }}"
session_token: "{{ secret:contractforge-aws/aws_session_token }}"

session_token is required when the AWS credentials are temporary. Keep tokens short-lived and store all values in Databricks secrets.

Options

OptionPurpose
recursiveFileLookupReads nested folders.
pathGlobFilterFilters file names using Spark glob syntax.
mergeSchemaEnables schema merging for compatible formats.
header, delimiter, modeCSV parsing options.

Use filter_expression for row filtering after read. Use connector options for reader-level behavior only.