Skip to main content

Azure Blob and ADLS

validated

Use the Azure Blob/ADLS connector for files in Azure Storage. On Databricks serverless, prefer Unity Catalog External Locations or Volumes. Use SAS-token configuration on classic/job clusters when runtime filesystem settings are allowed.

When to use it

Databricks serverless

External Location or Volume

Use governed storage access when Spark Connect/serverless blocks direct SAS/Hadoop filesystem configuration.

Classic/job cluster

SAS token or platform identity

Use this path when the cluster can receive Azure filesystem configuration and the contract should declare direct storage access.

Unity Catalog

Storage Credential and External Location

Prefer this model for shared teams that need governed access, auditing and clear separation between credentials and ingestion contracts.

Local Spark

ABFS/SAS configuration

Use for development and compatibility tests when Hadoop Azure libraries and local credentials are available.

Runtime requirements

RuntimeAccess modelUse when
Databricks serverlessUnity Catalog External Location or VolumeDirect SAS/Hadoop configuration is blocked or managed by platform policy.
Classic clusterSAS token, ABFS/SAS configuration or service principal setupYou control cluster-level Spark/Hadoop configuration.
Local SparkABFS/WASBS configuration or local credential providerDevelopment and compatibility tests outside Databricks.

Basic example

When the path is already governed by Unity Catalog, the contract should reference the readable ABFS path or Volume path. Credential setup belongs to the platform.

source:
type: connector
connector: azure_blob
path: abfss://databricksdata@generalcafe.dfs.core.windows.net/blob_teste/csv/
format: csv
options:
header: true
read:
schema: "id STRING, event_ts TIMESTAMP, amount DOUBLE"
source_complete: true

target:
catalog: contractforge
schema: bronze_examples
table: b_blob_csv

layer: bronze
mode: overwrite

SAS-token classic cluster pattern

Use this pattern when a classic/job cluster can configure Azure filesystem access and a SAS token is stored in a secret scope.

source:
type: connector
connector: azure_blob
account_url: https://generalcafe.blob.core.windows.net/
container: databricksdata
path: blob_teste/json/
format: json
auth:
sas_token: "{{ secret:contractforge-azure/blob_sas_token }}"
read:
schema: "id STRING, event_ts TIMESTAMP, payload STRUCT<kind:STRING,value:DOUBLE>"
recursiveFileLookup: true

target:
catalog: contractforge
schema: bronze_examples
table: b_blob_json

layer: bronze
mode: append

SAS permissions

Folder reads require list permission. Reading individual files may work with read-only SAS, but directory discovery, recursive reads and regex selection need list permission.

Creating an External Location

The exact SQL depends on how the storage credential is configured. The storage credential is a Unity Catalog object, not a Databricks secret scope.

CREATE EXTERNAL LOCATION IF NOT EXISTS contractforge_blob_teste
URL 'abfss://databricksdata@generalcafe.dfs.core.windows.net/blob_teste'
WITH (STORAGE CREDENTIAL my_storage_credential);

GRANT READ FILES ON EXTERNAL LOCATION contractforge_blob_teste TO `data-engineers`;

Credential validation

Unity Catalog validates the location by trying to read and often create a validation marker. Read-only credentials can fail location creation even when file reads would work.

Multi-format examples

The connector delegates parsing to Spark readers. Keep reader options and schema explicit for formats that have ambiguous inference behavior.

CSV

source:
type: connector
connector: azure_blob
path: abfss://container@account.dfs.core.windows.net/blob_teste/csv/
format: csv
options:
header: true
read:
schema: "id STRING, amount DOUBLE"

target:
catalog: contractforge
schema: bronze_examples
table: b_blob_csv_extract

layer: bronze
mode: append

JSON

source:
type: connector
connector: azure_blob
path: abfss://container@account.dfs.core.windows.net/blob_teste/json/
format: json
read:
schema: "id STRING, payload STRUCT<a:STRING>"

target:
catalog: contractforge
schema: bronze_examples
table: b_blob_nested_json

layer: bronze
mode: append

Parquet

source:
type: connector
connector: azure_blob
path: abfss://container@account.dfs.core.windows.net/blob_teste/parquet/
format: parquet
read:
recursiveFileLookup: true

target:
catalog: contractforge
schema: bronze_examples
table: b_blob_parquet

layer: bronze
mode: append

XML

source:
type: connector
connector: azure_blob
path: abfss://container@account.dfs.core.windows.net/blob_teste/xml/
format: xml
options:
rowTag: item
read:
schema: "id STRING, amount DOUBLE"

target:
catalog: contractforge
schema: bronze_examples
table: b_blob_xml

layer: bronze
mode: append

Serverless limitation handling

ContractForge should fail clearly when a serverless/Spark Connect runtime blocks direct filesystem credential configuration. Do not depend on a hidden fallback that only works in one workspace.

SymptomLikely causeAction
Cloud storage access failure during External Location creationStorage credential cannot validate read/write/list behavior.Grant the service principal the required Azure Storage role and verify the URL prefix.
SAS works on classic but fails on serverlessServerless blocks runtime Hadoop credential configuration.Use External Location, Volume or workspace network/storage policy.
Directory read fails but single file worksSAS lacks list permission.Regenerate SAS with list permission for folder reads.
DNS/egress errorsWorkspace network cannot reach the storage endpoint.Fix firewall, public network access, private endpoint or network policy.

Operational metadata

SELECT
run_id,
source_connector,
source_path,
source_auth_redacted_json,
source_read_redacted_json,
source_metrics_json
FROM main.ops.ctrl_ingestion_runs
WHERE source_connector = 'azure_blob'
ORDER BY started_at_utc DESC;

Common issues

SymptomLikely causeAction
Access denied on serverlessDirect SAS/Hadoop config is blocked by the runtime.Use External Location, Volume or workspace network policy.
Credential validation failsThe storage credential cannot read/write the validation path.Grant the required Azure Storage role to the service principal or managed identity.
Path lists no filesWrong container, prefix or recursive lookup setting.Validate the path with a minimal listing and check folder casing.
Format parsing failsSchema/options do not match the file format.Start with a small sample and explicit schema.