Provider-neutral path
Keep the source contract focused on path, format and schema while IAM, UC External Location or mounts handle access.
Connector
Use object_storage or blob when the contract should stay provider-neutral while the runtime handles access to cloud storage.
Choose the provider-neutral connector when the runtime already exposes a readable cloud path and the contract should not encode provider-specific authentication details. Use the dedicated S3 or Azure Blob pages when the contract must configure direct credentials or platform-specific access behavior.
Keep the source contract focused on path, format and schema while IAM, UC External Location or mounts handle access.
Use this when credentials are already configured by the platform and do not belong in the contract.
Parsing, recursive lookup, schema and filters follow the same expectations as file connectors.
Good fit for Databricks environments where Unity Catalog governs storage access.
| Requirement | Details |
|---|---|
| Readable path | The runtime must already be able to read/list the declared path. |
| Format support | The declared format must be supported by Spark or by installed runtime libraries. |
| Governance grants | For UC Volumes or External Locations, the job principal needs the appropriate catalog/storage permissions. |
| Complete source evidence | Set source_complete: true only when the folder represents the full source slice required by the write mode. |
source:
type: connector
connector: object_storage
provider: adls
path: abfss://landing@account.dfs.core.windows.net/orders/
format: json
read:
schema: "order_id STRING, updated_at TIMESTAMP, amount DOUBLE"
recursiveFileLookup: trueUse this connector when the path is already readable by the runtime and the contract should not care whether access is provided by IAM, Unity Catalog, mounted storage or local filesystem configuration.
source:
type: connector
connector: object_storage
provider: s3
path: s3://company-landing/orders/
format: parquet
read:
schema: "order_id STRING, updated_at TIMESTAMP, amount DOUBLE"SELECT run_id, status, source_connector, source_provider, source_format, source_path, rows_read, rows_written
FROM main.ops.ctrl_ingestion_runs
WHERE source_connector IN ('object_storage', 'blob')
ORDER BY started_at_utc DESC;| Symptom | Likely cause | Action |
|---|---|---|
| Path not found | The runtime cannot resolve the cloud path or the prefix is wrong. | Validate the path with Spark or platform file listing first. |
| Access denied | Missing IAM, External Location, Volume or mount permissions. | Fix runtime access rather than embedding provider-specific logic in this connector. |
| Format error | Declared format does not match file contents or runtime library is missing. | Validate the format with a minimal schema and sample path. |
Keep extraction concerns in source, structural normalization in transform, validation in quality_rules and target semantics in mode. This separation keeps examples portable and prevents connector-specific workarounds from becoming hidden business logic.