S3
Use the S3 connector for s3a:// paths when ContractForge should declare the
source path and format. The runtime still determines how credentials and network
access are resolved.
When to use it
External Location or Volume
Use Unity Catalog governed storage when the Databricks workspace runs on AWS and serverless compute should not receive direct S3A credentials.
External Location or network policy
Use a governed path when the workspace is not running in AWS or when direct filesystem credentials are blocked by platform policy.
Direct S3A access
Use scoped access keys, temporary session tokens, instance profiles or Hadoop AWS configuration when the cluster runtime allows direct S3A setup.
Path, format and evidence
Keep object access explicit in source; use transform and quality rules for
business normalization after the files are read.
Runtime guidance
| Runtime | Recommended access path |
|---|---|
| Databricks serverless on AWS | Unity Catalog external location or volume backed by S3. |
| Databricks serverless outside AWS | External location or network policy if supported by the workspace. |
| Classic cluster | Direct s3a:// access with Hadoop AWS configuration, instance profile or scoped access keys. |
Serverless is stricter by design. If direct credentials are not available to the Spark runtime, use an external location and point the connector at the volume or external path.
Parquet folder
- YAML
- Python
source:
type: connector
connector: s3
path: s3a://company-landing/orders/
format: parquet
options:
recursiveFileLookup: true
source = {
"type": "connector",
"connector": "s3",
"path": "s3a://company-landing/orders/",
"format": "parquet",
"options": {"recursiveFileLookup": True},
}
Direct credentials on classic clusters
- YAML
- Python
source:
type: connector
connector: s3
path: s3a://company-landing/orders/
format: json
auth:
type: access_key
access_key_id: "{{ secret:contractforge-aws/aws_access_key_id }}"
secret_access_key: "{{ secret:contractforge-aws/aws_secret_access_key }}"
session_token: "{{ secret:contractforge-aws/aws_session_token }}"
source = {
"type": "connector",
"connector": "s3",
"path": "s3a://company-landing/orders/",
"format": "json",
"auth": {
"type": "access_key",
"access_key_id": "{{ secret:contractforge-aws/aws_access_key_id }}",
"secret_access_key": "{{ secret:contractforge-aws/aws_secret_access_key }}",
"session_token": "{{ secret:contractforge-aws/aws_session_token }}",
},
}
session_token is required when the AWS credentials are temporary. Keep tokens
short-lived and store all values in Databricks secrets.
Options
| Option | Purpose |
|---|---|
recursiveFileLookup | Reads nested folders. |
pathGlobFilter | Filters file names using Spark glob syntax. |
mergeSchema | Enables schema merging for compatible formats. |
header, delimiter, mode | CSV parsing options. |
Use filter_expression for row filtering after read. Use connector options for
reader-level behavior only.