Azure Blob and ADLS
Use the Azure Blob/ADLS connector for files in Azure Storage. On Databricks serverless, prefer Unity Catalog External Locations or Volumes. Use SAS-token configuration on classic/job clusters when runtime filesystem settings are allowed.
When to use it
External Location or Volume
Use governed storage access when Spark Connect/serverless blocks direct SAS/Hadoop filesystem configuration.
SAS token or platform identity
Use this path when the cluster can receive Azure filesystem configuration and the contract should declare direct storage access.
Storage Credential and External Location
Prefer this model for shared teams that need governed access, auditing and clear separation between credentials and ingestion contracts.
ABFS/SAS configuration
Use for development and compatibility tests when Hadoop Azure libraries and local credentials are available.
Runtime requirements
| Runtime | Access model | Use when |
|---|---|---|
| Databricks serverless | Unity Catalog External Location or Volume | Direct SAS/Hadoop configuration is blocked or managed by platform policy. |
| Classic cluster | SAS token, ABFS/SAS configuration or service principal setup | You control cluster-level Spark/Hadoop configuration. |
| Local Spark | ABFS/WASBS configuration or local credential provider | Development and compatibility tests outside Databricks. |
Basic example
When the path is already governed by Unity Catalog, the contract should reference the readable ABFS path or Volume path. Credential setup belongs to the platform.
- YAML
- Python
source:
type: connector
connector: azure_blob
path: abfss://databricksdata@generalcafe.dfs.core.windows.net/blob_teste/csv/
format: csv
options:
header: true
read:
schema: "id STRING, event_ts TIMESTAMP, amount DOUBLE"
source_complete: true
target:
catalog: contractforge
schema: bronze_examples
table: b_blob_csv
layer: bronze
mode: overwrite
from contractforge import ingest
result = ingest(
source={
"type": "connector",
"connector": "azure_blob",
"path": "abfss://databricksdata@generalcafe.dfs.core.windows.net/blob_teste/csv/",
"format": "csv",
"options": {"header": True},
"read": {
"schema": "id STRING, event_ts TIMESTAMP, amount DOUBLE",
"source_complete": True,
},
},
catalog="contractforge",
target_schema="bronze_examples",
target_table="b_blob_csv",
layer="bronze",
mode="overwrite",
)
SAS-token classic cluster pattern
Use this pattern when a classic/job cluster can configure Azure filesystem access and a SAS token is stored in a secret scope.
- YAML
- Python
source:
type: connector
connector: azure_blob
account_url: https://generalcafe.blob.core.windows.net/
container: databricksdata
path: blob_teste/json/
format: json
auth:
sas_token: "{{ secret:contractforge-azure/blob_sas_token }}"
read:
schema: "id STRING, event_ts TIMESTAMP, payload STRUCT<kind:STRING,value:DOUBLE>"
recursiveFileLookup: true
target:
catalog: contractforge
schema: bronze_examples
table: b_blob_json
layer: bronze
mode: append
from contractforge import ingest
result = ingest(
source={
"type": "connector",
"connector": "azure_blob",
"account_url": "https://generalcafe.blob.core.windows.net/",
"container": "databricksdata",
"path": "blob_teste/json/",
"format": "json",
"auth": {"sas_token": "{{ secret:contractforge-azure/blob_sas_token }}"},
"read": {
"schema": "id STRING, event_ts TIMESTAMP, payload STRUCT<kind:STRING,value:DOUBLE>",
"recursiveFileLookup": True,
},
},
catalog="contractforge",
target_schema="bronze_examples",
target_table="b_blob_json",
layer="bronze",
mode="append",
)
SAS permissions
Folder reads require list permission. Reading individual files may work with read-only SAS, but directory discovery, recursive reads and regex selection need list permission.
Creating an External Location
The exact SQL depends on how the storage credential is configured. The storage credential is a Unity Catalog object, not a Databricks secret scope.
CREATE EXTERNAL LOCATION IF NOT EXISTS contractforge_blob_teste
URL 'abfss://databricksdata@generalcafe.dfs.core.windows.net/blob_teste'
WITH (STORAGE CREDENTIAL my_storage_credential);
GRANT READ FILES ON EXTERNAL LOCATION contractforge_blob_teste TO `data-engineers`;
Credential validation
Unity Catalog validates the location by trying to read and often create a validation marker. Read-only credentials can fail location creation even when file reads would work.
Multi-format examples
The connector delegates parsing to Spark readers. Keep reader options and schema explicit for formats that have ambiguous inference behavior.
CSV
- YAML
- Python
source:
type: connector
connector: azure_blob
path: abfss://container@account.dfs.core.windows.net/blob_teste/csv/
format: csv
options:
header: true
read:
schema: "id STRING, amount DOUBLE"
target:
catalog: contractforge
schema: bronze_examples
table: b_blob_csv_extract
layer: bronze
mode: append
from contractforge import ingest
result = ingest(
source={
"type": "connector",
"connector": "azure_blob",
"path": "abfss://container@account.dfs.core.windows.net/blob_teste/csv/",
"format": "csv",
"options": {"header": True},
"read": {"schema": "id STRING, amount DOUBLE"},
},
catalog="contractforge",
target_schema="bronze_examples",
target_table="b_blob_csv_extract",
layer="bronze",
mode="append",
)
JSON
- YAML
- Python
source:
type: connector
connector: azure_blob
path: abfss://container@account.dfs.core.windows.net/blob_teste/json/
format: json
read:
schema: "id STRING, payload STRUCT<a:STRING>"
target:
catalog: contractforge
schema: bronze_examples
table: b_blob_nested_json
layer: bronze
mode: append
from contractforge import ingest
result = ingest(
source={
"type": "connector",
"connector": "azure_blob",
"path": "abfss://container@account.dfs.core.windows.net/blob_teste/json/",
"format": "json",
"read": {"schema": "id STRING, payload STRUCT<a:STRING>"},
},
catalog="contractforge",
target_schema="bronze_examples",
target_table="b_blob_nested_json",
layer="bronze",
mode="append",
)
Parquet
- YAML
- Python
source:
type: connector
connector: azure_blob
path: abfss://container@account.dfs.core.windows.net/blob_teste/parquet/
format: parquet
read:
recursiveFileLookup: true
target:
catalog: contractforge
schema: bronze_examples
table: b_blob_parquet
layer: bronze
mode: append
from contractforge import ingest
result = ingest(
source={
"type": "connector",
"connector": "azure_blob",
"path": "abfss://container@account.dfs.core.windows.net/blob_teste/parquet/",
"format": "parquet",
"read": {"recursiveFileLookup": True},
},
catalog="contractforge",
target_schema="bronze_examples",
target_table="b_blob_parquet",
layer="bronze",
mode="append",
)
XML
- YAML
- Python
source:
type: connector
connector: azure_blob
path: abfss://container@account.dfs.core.windows.net/blob_teste/xml/
format: xml
options:
rowTag: item
read:
schema: "id STRING, amount DOUBLE"
target:
catalog: contractforge
schema: bronze_examples
table: b_blob_xml
layer: bronze
mode: append
from contractforge import ingest
result = ingest(
source={
"type": "connector",
"connector": "azure_blob",
"path": "abfss://container@account.dfs.core.windows.net/blob_teste/xml/",
"format": "xml",
"options": {"rowTag": "item"},
"read": {"schema": "id STRING, amount DOUBLE"},
},
catalog="contractforge",
target_schema="bronze_examples",
target_table="b_blob_xml",
layer="bronze",
mode="append",
)
Serverless limitation handling
ContractForge should fail clearly when a serverless/Spark Connect runtime blocks direct filesystem credential configuration. Do not depend on a hidden fallback that only works in one workspace.
| Symptom | Likely cause | Action |
|---|---|---|
| Cloud storage access failure during External Location creation | Storage credential cannot validate read/write/list behavior. | Grant the service principal the required Azure Storage role and verify the URL prefix. |
| SAS works on classic but fails on serverless | Serverless blocks runtime Hadoop credential configuration. | Use External Location, Volume or workspace network/storage policy. |
| Directory read fails but single file works | SAS lacks list permission. | Regenerate SAS with list permission for folder reads. |
| DNS/egress errors | Workspace network cannot reach the storage endpoint. | Fix firewall, public network access, private endpoint or network policy. |
Operational metadata
SELECT
run_id,
source_connector,
source_path,
source_auth_redacted_json,
source_read_redacted_json,
source_metrics_json
FROM main.ops.ctrl_ingestion_runs
WHERE source_connector = 'azure_blob'
ORDER BY started_at_utc DESC;
Common issues
| Symptom | Likely cause | Action |
|---|---|---|
| Access denied on serverless | Direct SAS/Hadoop config is blocked by the runtime. | Use External Location, Volume or workspace network policy. |
| Credential validation fails | The storage credential cannot read/write the validation path. | Grant the required Azure Storage role to the service principal or managed identity. |
| Path lists no files | Wrong container, prefix or recursive lookup setting. | Validate the path with a minimal listing and check folder casing. |
| Format parsing fails | Schema/options do not match the file format. | Start with a small sample and explicit schema. |