Connector
Azure Blob and ADLS
Use the Azure Blob/ADLS connector for files in Azure Storage. On Databricks serverless, prefer Unity Catalog External Locations or Volumes. Use SAS-token configuration on classic/job clusters when runtime filesystem settings are allowed.
When to use it
| Runtime | Recommended access | Notes |
|---|---|---|
| Databricks serverless | External Location or Volume | Direct SAS/Hadoop config can be blocked by Spark Connect/serverless. |
| Classic/job cluster | SAS token or platform identity | Cluster can usually receive storage account filesystem configuration. |
| Azure Databricks with Unity Catalog | Storage Credential + External Location | Best governance model for shared teams. |
| Local Spark | ABFS/SAS configuration | Useful for development when Hadoop Azure libraries exist. |
Runtime requirements
| Runtime | Access model | Use when |
|---|---|---|
| Databricks serverless | Unity Catalog External Location or Volume | Direct SAS/Hadoop configuration is blocked or managed by platform policy. |
| Classic cluster | SAS token, ABFS/SAS configuration or service principal setup | You control cluster-level Spark/Hadoop configuration. |
| Local Spark | ABFS/WASBS configuration or local credential provider | Development and compatibility tests outside Databricks. |
Basic example
When the path is already governed by Unity Catalog, the contract should reference the readable ABFS path or Volume path. Credential setup belongs to the platform.
source:
type: connector
connector: azure_blob
path: abfss://databricksdata@generalcafe.dfs.core.windows.net/blob_teste/csv/
format: csv
options:
header: true
read:
schema: "id STRING, event_ts TIMESTAMP, amount DOUBLE"
source_complete: true
target:
catalog: contractforge
schema: bronze_examples
table: b_blob_csv
layer: bronze
mode: scd0_overwrite
from contractforge import ingest
result = ingest(
source={
"type": "connector",
"connector": "azure_blob",
"path": "abfss://databricksdata@generalcafe.dfs.core.windows.net/blob_teste/csv/",
"format": "csv",
"options": {"header": True},
"read": {
"schema": "id STRING, event_ts TIMESTAMP, amount DOUBLE",
"source_complete": True,
},
},
catalog="contractforge",
target_schema="bronze_examples",
target_table="b_blob_csv",
layer="bronze",
mode="scd0_overwrite",
)
SAS-token classic cluster pattern
Use this pattern when a classic/job cluster can configure Azure filesystem access and a SAS token is stored in a secret scope.
source:
type: connector
connector: azure_blob
account_url: https://generalcafe.blob.core.windows.net/
container: databricksdata
path: blob_teste/json/
format: json
auth:
sas_token: "{{ secret:contractforge-azure/blob_sas_token }}"
read:
schema: "id STRING, event_ts TIMESTAMP, payload STRUCT<kind:STRING,value:DOUBLE>"
recursiveFileLookup: true
target:
catalog: contractforge
schema: bronze_examples
table: b_blob_json
layer: bronze
mode: scd0_append
from contractforge import ingest
result = ingest(
source={
"type": "connector",
"connector": "azure_blob",
"account_url": "https://generalcafe.blob.core.windows.net/",
"container": "databricksdata",
"path": "blob_teste/json/",
"format": "json",
"auth": {"sas_token": "{{ secret:contractforge-azure/blob_sas_token }}"},
"read": {
"schema": "id STRING, event_ts TIMESTAMP, payload STRUCT<kind:STRING,value:DOUBLE>",
"recursiveFileLookup": True,
},
},
catalog="contractforge",
target_schema="bronze_examples",
target_table="b_blob_json",
layer="bronze",
mode="scd0_append",
)
Folder reads require list permission. Reading individual files may work with read-only SAS, but directory discovery, recursive reads and regex selection need list permission.
Creating an External Location
The exact SQL depends on how the storage credential is configured. The storage credential is a Unity Catalog object, not a Databricks secret scope.
CREATE EXTERNAL LOCATION IF NOT EXISTS contractforge_blob_teste
URL 'abfss://databricksdata@generalcafe.dfs.core.windows.net/blob_teste'
WITH (STORAGE CREDENTIAL my_storage_credential);
GRANT READ FILES ON EXTERNAL LOCATION contractforge_blob_teste TO `data-engineers`;
Unity Catalog validates the location by trying to read and often create a validation marker. Read-only credentials can fail location creation even when file reads would work.
Multi-format examples
The connector delegates parsing to Spark readers. Keep reader options and schema explicit for formats that have ambiguous inference behavior.
CSV
source:
type: connector
connector: azure_blob
path: abfss://container@account.dfs.core.windows.net/blob_teste/csv/
format: csv
options:
header: true
read:
schema: "id STRING, amount DOUBLE"
target:
catalog: contractforge
schema: bronze_examples
table: b_blob_csv_extract
layer: bronze
mode: scd0_append
from contractforge import ingest
result = ingest(
source={
"type": "connector",
"connector": "azure_blob",
"path": "abfss://container@account.dfs.core.windows.net/blob_teste/csv/",
"format": "csv",
"options": {"header": True},
"read": {"schema": "id STRING, amount DOUBLE"},
},
catalog="contractforge",
target_schema="bronze_examples",
target_table="b_blob_csv_extract",
layer="bronze",
mode="scd0_append",
)
JSON
source:
type: connector
connector: azure_blob
path: abfss://container@account.dfs.core.windows.net/blob_teste/json/
format: json
read:
schema: "id STRING, payload STRUCT<a:STRING>"
target:
catalog: contractforge
schema: bronze_examples
table: b_blob_nested_json
layer: bronze
mode: scd0_append
from contractforge import ingest
result = ingest(
source={
"type": "connector",
"connector": "azure_blob",
"path": "abfss://container@account.dfs.core.windows.net/blob_teste/json/",
"format": "json",
"read": {"schema": "id STRING, payload STRUCT<a:STRING>"},
},
catalog="contractforge",
target_schema="bronze_examples",
target_table="b_blob_nested_json",
layer="bronze",
mode="scd0_append",
)
Parquet
source:
type: connector
connector: azure_blob
path: abfss://container@account.dfs.core.windows.net/blob_teste/parquet/
format: parquet
read:
recursiveFileLookup: true
target:
catalog: contractforge
schema: bronze_examples
table: b_blob_parquet
layer: bronze
mode: scd0_append
from contractforge import ingest
result = ingest(
source={
"type": "connector",
"connector": "azure_blob",
"path": "abfss://container@account.dfs.core.windows.net/blob_teste/parquet/",
"format": "parquet",
"read": {"recursiveFileLookup": True},
},
catalog="contractforge",
target_schema="bronze_examples",
target_table="b_blob_parquet",
layer="bronze",
mode="scd0_append",
)
XML
source:
type: connector
connector: azure_blob
path: abfss://container@account.dfs.core.windows.net/blob_teste/xml/
format: xml
options:
rowTag: item
read:
schema: "id STRING, amount DOUBLE"
target:
catalog: contractforge
schema: bronze_examples
table: b_blob_xml
layer: bronze
mode: scd0_append
from contractforge import ingest
result = ingest(
source={
"type": "connector",
"connector": "azure_blob",
"path": "abfss://container@account.dfs.core.windows.net/blob_teste/xml/",
"format": "xml",
"options": {"rowTag": "item"},
"read": {"schema": "id STRING, amount DOUBLE"},
},
catalog="contractforge",
target_schema="bronze_examples",
target_table="b_blob_xml",
layer="bronze",
mode="scd0_append",
)
Serverless limitation handling
ContractForge should fail clearly when a serverless/Spark Connect runtime blocks direct filesystem credential configuration. Do not depend on a hidden fallback that only works in one workspace.
| Symptom | Likely cause | Action |
|---|---|---|
| Cloud storage access failure during External Location creation | Storage credential cannot validate read/write/list behavior. | Grant the service principal the required Azure Storage role and verify the URL prefix. |
| SAS works on classic but fails on serverless | Serverless blocks runtime Hadoop credential configuration. | Use External Location, Volume or workspace network/storage policy. |
| Directory read fails but single file works | SAS lacks list permission. | Regenerate SAS with list permission for folder reads. |
| DNS/egress errors | Workspace network cannot reach the storage endpoint. | Fix firewall, public network access, private endpoint or network policy. |
Operational metadata
SELECT
run_id,
source_connector,
source_path,
source_auth_redacted_json,
source_read_redacted_json,
source_metrics_json
FROM main.ops.ctrl_ingestion_runs
WHERE source_connector = 'azure_blob'
ORDER BY started_at_utc DESC;
Common issues
| Symptom | Likely cause | Action |
|---|---|---|
| Access denied on serverless | Direct SAS/Hadoop config is blocked by the runtime. | Use External Location, Volume or workspace network policy. |
| Credential validation fails | The storage credential cannot read/write the validation path. | Grant the required Azure Storage role to the service principal or managed identity. |
| Path lists no files | Wrong container, prefix or recursive lookup setting. | Validate the path with a minimal listing and check folder casing. |
| Format parsing fails | Schema/options do not match the file format. | Start with a small sample and explicit schema. |