Built-in Spark formats
CSV, JSON, Parquet, Delta, ORC and text usually work when Spark supports them.
Operations
ContractForge is runtime-aware, but storage access, connector dependencies and Unity Catalog features still depend on where the code runs.
| Capability | Classic cluster | Serverless / Spark Connect | Local Spark |
|---|---|---|---|
| Delta batch writes | Supported | Supported | Supported with Delta dependencies |
| Python DeltaTable APIs | Usually supported | Limited | Depends on dependencies |
| SQL MERGE | Supported | Supported | Supported with Delta |
| Auto Loader | Databricks only | Databricks only | Not available |
| Unity Catalog annotations | Depends on permissions | Depends on permissions | Not available |
| Direct object-storage credentials | Common pattern | Prefer External Location or Volume | Environment-dependent |
| Federated sources | Supported when UC connection exists | Recommended for governed external systems | Not available |
| JDBC | Requires driver/network | Requires driver/network support | Requires driver/network |
CSV, JSON, Parquet, Delta, ORC and text usually work when Spark supports them.
XML, Avro, Snowflake, BigQuery and JDBC drivers must be available in the runtime that executes the read.
Serverless should use external locations or volumes; classic clusters can use Spark/Hadoop credentials when policy allows it.
REST and HTTP file connectors depend on driver network egress, DNS, proxy rules and API limits.
Use serverless for governed, repeatable ingestion where the access path is configured by the platform. Use classic clusters when you need to control low-level libraries, JVM options or filesystem credentials directly.
| Source | Preferred serverless pattern | When to use classic instead |
|---|---|---|
| S3 / Azure Blob / ADLS | Unity Catalog External Location or Volume; ContractForge reads the governed path. | Direct S3A/ABFS/SAS credentials, custom Hadoop libraries or nonstandard credential providers. |
| BigQuery | Lakehouse Federation, then table or sql connector. | Direct Spark BigQuery connector with service-account file, materialization dataset and custom package control. |
| Snowflake | Direct connector with service user, PAT/JWT and Snowflake network policy allowing the Databricks egress path. | Connector/JDBC package control, custom networking or PrivateLink-only setups not exposed to serverless. |
| JDBC / RDS IAM | Supported when driver, route, SSL and IAM credentials are available to the runtime. | Private databases that require VPC/VNet-level customization, driver installation or long-running extraction tuning. |
| REST / HTTP file | Driver-side HTTP with explicit timeouts, retries and size/page limits. | Large payloads that should first be landed in object storage or API routes requiring custom proxies. |
If a feature depends on a platform capability, document it in the connector page and expose a clear error. Do not silently fall back to a weaker behavior that may hide runtime misconfiguration.