Skip to main content

Object storage

validated

Use object_storage or blob when the contract should stay provider-neutral while the runtime handles access to cloud storage.

When to use it

Choose the provider-neutral connector when the runtime already exposes a readable cloud path and the contract should not encode provider-specific authentication details. Use the dedicated S3 or Azure Blob pages when the contract must configure direct credentials or platform-specific access behavior.

Portable contracts

Provider-neutral path

Keep the source contract focused on path, format and schema while IAM, UC External Location or mounts handle access.

Runtime-owned auth

No explicit secrets

Use this when credentials are already configured by the platform and do not belong in the contract.

File semantics

Spark reader behavior

Parsing, recursive lookup, schema and filters follow the same expectations as file connectors.

Governed storage

Volumes and external locations

Good fit for Databricks environments where Unity Catalog governs storage access.

Runtime requirements

RequirementDetails
Readable pathThe runtime must already be able to read/list the declared path.
Format supportThe declared format must be supported by Spark or by installed runtime libraries.
Governance grantsFor UC Volumes or External Locations, the job principal needs the appropriate catalog/storage permissions.
Complete source evidenceSet source_complete: true only when the folder represents the full source slice required by the write mode.

Basic example

source:
type: connector
connector: object_storage
provider: adls
path: abfss://landing@account.dfs.core.windows.net/orders/
format: json
read:
schema: "order_id STRING, updated_at TIMESTAMP, amount DOUBLE"
recursiveFileLookup: true

Supported providers

  • S3: AWS S3 with External Location or S3A credentials.
  • Azure Blob: Azure Blob/ADLS with UC External Location, Volume or SAS on allowed runtimes.

Provider-neutral pattern

Use this connector when the path is already readable by the runtime and the contract should not care whether access is provided by IAM, Unity Catalog, mounted storage or local filesystem configuration.

source:
type: connector
connector: object_storage
provider: s3
path: s3://company-landing/orders/
format: parquet
read:
schema: "order_id STRING, updated_at TIMESTAMP, amount DOUBLE"

Operational validation

SELECT run_id, status, source_connector, source_provider, source_format, source_path, rows_read, rows_written
FROM main.ops.ctrl_ingestion_runs
WHERE source_connector IN ('object_storage', 'blob')
ORDER BY started_at_utc DESC;

Common issues

SymptomLikely causeAction
Path not foundThe runtime cannot resolve the cloud path or the prefix is wrong.Validate the path with Spark or platform file listing first.
Access deniedMissing IAM, External Location, Volume or mount permissions.Fix runtime access rather than embedding provider-specific logic in this connector.
Format errorDeclared format does not match file contents or runtime library is missing.Validate the format with a minimal schema and sample path.

How this connector fits the contract

Keep extraction concerns in source, structural normalization in transform, validation in quality_rules and target semantics in mode. This separation keeps examples portable and prevents connector-specific workarounds from becoming hidden business logic.