When to use it

Choose the provider-neutral connector when the runtime already exposes a readable cloud path and the contract should not encode provider-specific authentication details. Use the dedicated S3 or Azure Blob pages when the contract must configure direct credentials or platform-specific access behavior.

Portable contracts

Provider-neutral path

Keep the source contract focused on path, format and schema while IAM, UC External Location or mounts handle access.

Runtime-owned auth

No explicit secrets

Use this when credentials are already configured by the platform and do not belong in the contract.

File semantics

Spark reader behavior

Parsing, recursive lookup, schema and filters follow the same expectations as file connectors.

Governed storage

Volumes and external locations

Good fit for Databricks environments where Unity Catalog governs storage access.

Runtime requirements

RequirementDetails
Readable pathThe runtime must already be able to read/list the declared path.
Format supportThe declared format must be supported by Spark or by installed runtime libraries.
Governance grantsFor UC Volumes or External Locations, the job principal needs the appropriate catalog/storage permissions.
Complete source evidenceSet source_complete: true only when the folder represents the full source slice required by the write mode.

Basic example

source:
  type: connector
  connector: object_storage
  provider: adls
  path: abfss://landing@account.dfs.core.windows.net/orders/
  format: json
  read:
    schema: "order_id STRING, updated_at TIMESTAMP, amount DOUBLE"
    recursiveFileLookup: true

Supported providers

Provider-neutral pattern

Use this connector when the path is already readable by the runtime and the contract should not care whether access is provided by IAM, Unity Catalog, mounted storage or local filesystem configuration.

source:
  type: connector
  connector: object_storage
  provider: s3
  path: s3://company-landing/orders/
  format: parquet
  read:
    schema: "order_id STRING, updated_at TIMESTAMP, amount DOUBLE"

Operational validation

SELECT run_id, status, source_connector, source_provider, source_format, source_path, rows_read, rows_written
FROM main.ops.ctrl_ingestion_runs
WHERE source_connector IN ('object_storage', 'blob')
ORDER BY started_at_utc DESC;

Common issues

SymptomLikely causeAction
Path not foundThe runtime cannot resolve the cloud path or the prefix is wrong.Validate the path with Spark or platform file listing first.
Access deniedMissing IAM, External Location, Volume or mount permissions.Fix runtime access rather than embedding provider-specific logic in this connector.
Format errorDeclared format does not match file contents or runtime library is missing.Validate the format with a minimal schema and sample path.

How this connector fits the contract

Keep extraction concerns in source, structural normalization in transform, validation in quality_rules and target semantics in mode. This separation keeps examples portable and prevents connector-specific workarounds from becoming hidden business logic.

sourcetransformquality_rulesmodecontrol tables