When to use it

RuntimeRecommended accessNotes
Databricks serverlessExternal Location or VolumeDirect SAS/Hadoop config can be blocked by Spark Connect/serverless.
Classic/job clusterSAS token or platform identityCluster can usually receive storage account filesystem configuration.
Azure Databricks with Unity CatalogStorage Credential + External LocationBest governance model for shared teams.
Local SparkABFS/SAS configurationUseful for development when Hadoop Azure libraries exist.

Runtime requirements

RuntimeAccess modelUse when
Databricks serverlessUnity Catalog External Location or VolumeDirect SAS/Hadoop configuration is blocked or managed by platform policy.
Classic clusterSAS token, ABFS/SAS configuration or service principal setupYou control cluster-level Spark/Hadoop configuration.
Local SparkABFS/WASBS configuration or local credential providerDevelopment and compatibility tests outside Databricks.

Basic example

When the path is already governed by Unity Catalog, the contract should reference the readable ABFS path or Volume path. Credential setup belongs to the platform.

source:
  type: connector
  connector: azure_blob
  path: abfss://databricksdata@generalcafe.dfs.core.windows.net/blob_teste/csv/
  format: csv
  options:
    header: true
  read:
    schema: "id STRING, event_ts TIMESTAMP, amount DOUBLE"
    source_complete: true

target:
  catalog: contractforge
  schema: bronze_examples
  table: b_blob_csv

layer: bronze
mode: scd0_overwrite

SAS-token classic cluster pattern

Use this pattern when a classic/job cluster can configure Azure filesystem access and a SAS token is stored in a secret scope.

source:
  type: connector
  connector: azure_blob
  account_url: https://generalcafe.blob.core.windows.net/
  container: databricksdata
  path: blob_teste/json/
  format: json
  auth:
    sas_token: "{{ secret:contractforge-azure/blob_sas_token }}"
  read:
    schema: "id STRING, event_ts TIMESTAMP, payload STRUCT<kind:STRING,value:DOUBLE>"
    recursiveFileLookup: true

target:
  catalog: contractforge
  schema: bronze_examples
  table: b_blob_json

layer: bronze
mode: scd0_append
SAS permissions

Folder reads require list permission. Reading individual files may work with read-only SAS, but directory discovery, recursive reads and regex selection need list permission.

Creating an External Location

The exact SQL depends on how the storage credential is configured. The storage credential is a Unity Catalog object, not a Databricks secret scope.

CREATE EXTERNAL LOCATION IF NOT EXISTS contractforge_blob_teste
URL 'abfss://databricksdata@generalcafe.dfs.core.windows.net/blob_teste'
WITH (STORAGE CREDENTIAL my_storage_credential);

GRANT READ FILES ON EXTERNAL LOCATION contractforge_blob_teste TO `data-engineers`;
Credential validation

Unity Catalog validates the location by trying to read and often create a validation marker. Read-only credentials can fail location creation even when file reads would work.

Multi-format examples

The connector delegates parsing to Spark readers. Keep reader options and schema explicit for formats that have ambiguous inference behavior.

CSV

source:
  type: connector
  connector: azure_blob
  path: abfss://container@account.dfs.core.windows.net/blob_teste/csv/
  format: csv
  options:
    header: true
  read:
    schema: "id STRING, amount DOUBLE"

target:
  catalog: contractforge
  schema: bronze_examples
  table: b_blob_csv_extract

layer: bronze
mode: scd0_append

JSON

source:
  type: connector
  connector: azure_blob
  path: abfss://container@account.dfs.core.windows.net/blob_teste/json/
  format: json
  read:
    schema: "id STRING, payload STRUCT<a:STRING>"

target:
  catalog: contractforge
  schema: bronze_examples
  table: b_blob_nested_json

layer: bronze
mode: scd0_append

Parquet

source:
  type: connector
  connector: azure_blob
  path: abfss://container@account.dfs.core.windows.net/blob_teste/parquet/
  format: parquet
  read:
    recursiveFileLookup: true

target:
  catalog: contractforge
  schema: bronze_examples
  table: b_blob_parquet

layer: bronze
mode: scd0_append

XML

source:
  type: connector
  connector: azure_blob
  path: abfss://container@account.dfs.core.windows.net/blob_teste/xml/
  format: xml
  options:
    rowTag: item
  read:
    schema: "id STRING, amount DOUBLE"

target:
  catalog: contractforge
  schema: bronze_examples
  table: b_blob_xml

layer: bronze
mode: scd0_append

Serverless limitation handling

ContractForge should fail clearly when a serverless/Spark Connect runtime blocks direct filesystem credential configuration. Do not depend on a hidden fallback that only works in one workspace.

SymptomLikely causeAction
Cloud storage access failure during External Location creationStorage credential cannot validate read/write/list behavior.Grant the service principal the required Azure Storage role and verify the URL prefix.
SAS works on classic but fails on serverlessServerless blocks runtime Hadoop credential configuration.Use External Location, Volume or workspace network/storage policy.
Directory read fails but single file worksSAS lacks list permission.Regenerate SAS with list permission for folder reads.
DNS/egress errorsWorkspace network cannot reach the storage endpoint.Fix firewall, public network access, private endpoint or network policy.

Operational metadata

SELECT
  run_id,
  source_connector,
  source_path,
  source_auth_redacted_json,
  source_read_redacted_json,
  source_metrics_json
FROM main.ops.ctrl_ingestion_runs
WHERE source_connector = 'azure_blob'
ORDER BY started_at_utc DESC;

Common issues

SymptomLikely causeAction
Access denied on serverlessDirect SAS/Hadoop config is blocked by the runtime.Use External Location, Volume or workspace network policy.
Credential validation failsThe storage credential cannot read/write the validation path.Grant the required Azure Storage role to the service principal or managed identity.
Path lists no filesWrong container, prefix or recursive lookup setting.Validate the path with a minimal listing and check folder casing.
Format parsing failsSchema/options do not match the file format.Start with a small sample and explicit schema.