SAP S/4HANA and Databricks integration
SAP S/4HANA runs procurement, finance, and supply chain. Databricks stores and transforms that data at scale. Connecting them moves supplier invoices, purchase orders, GL accounts, cost centers, and business partner master records from S/4HANA into Databricks tables, where your data team can build analytics, reconciliation reports, and real-time dashboards without waiting for month-end exports. Every record carries a full audit trail and can be replayed if a downstream transformation fails.
What moves between them
Procurement and finance master data flows one direction: from SAP S/4HANA to Databricks. Supplier invoices, purchase orders, GL account line items, cost centers, business partners, and journal entry references are read from S/4HANA on a schedule tied to your finance close calendar and written as Databricks tables. Reference data such as GL account text and cost center descriptions are replicated so downstream joins and analytics can filter and aggregate by those dimensions. S/4HANA GL accounts and cost centers are read-only on the SAP side, so ml-connector never writes back into SAP's chart of accounts or cost structure.
How ml-connector handles it
ml-connector stores the S/4HANA OAuth client credentials and Databricks service principal secret encrypted at rest. It reads the tenant-specific token endpoint from the Communication Arrangement OAuth details (not constructing it) and refreshes S/4HANA tokens before the 12-hour expiry window. For Databricks, it uses the workspace-specific or account-level token endpoint and handles the 3600-second token refresh. OData filters on LastChangeDateTime ensure incremental syncs without re-reading unchanged records. ml-connector polls both systems on a schedule you define, typically aligned with your daily or weekly close routine. S/4HANA's read-only GL Account Line Items API is not optimized for large bulk extracts, so ml-connector paginates and stages results. Databricks table writes use SQL to ensure data is materialized into the target schema. Every record pulled from S/4HANA carries a change timestamp and source system identifier in Databricks, enabling full audit and replay if a downstream transformation fails.
A real-world example
A global manufacturing company runs SAP S/4HANA across three regions for procurement and finance. The shared services finance team needs to analyze supplier spending and GL accruals by cost center every week but today must export supplier invoice data and GL account balances manually from S/4HANA, load them into spreadsheets, and then copy them into a data warehouse. With S/4HANA and Databricks connected, supplier invoices and GL line items flow automatically into Databricks tables on a weekly schedule aligned with the close calendar. The finance team writes SQL directly against those tables to build dynamic dashboards and exception reports without re-keying, and month-end reconciliation starts with all supplier and GL data already synced and ready to aggregate.
What you can do
- Replicate supplier invoices, purchase orders, GL account line items, and business partner master records from SAP S/4HANA to Databricks tables on a schedule you control.
- Handle S/4HANA OAuth 2.0 client credentials scoped to Communication Arrangements, with automatic token refresh before expiry.
- Map SAP tenant-specific endpoints to Databricks workspace URLs and manage both systems' credential sets encrypted.
- Poll S/4HANA via LastChangeDateTime filters to sync only new and changed records, avoiding bulk re-extracts.
- Trace every record with source timestamps and audit metadata so downstream transformations can be replayed if they fail.
Questions
- Do I need SAP or Databricks admin setup before the integration starts?
- Yes. An SAP admin must create a Communication System, OAuth User, and Communication Arrangement with the correct OAuth 2.0 scopes before ml-connector can authenticate. On the Databricks side, a workspace admin must create a service principal with the appropriate REST API permissions. ml-connector cannot create these; it assumes they already exist.
- Why does SAP S/4HANA require a tenant-specific token endpoint?
- SAP does not publish a shared token endpoint. Instead, the endpoint is unique to each tenant and is exposed in the Communication Arrangement OAuth details. ml-connector must copy that endpoint from the Communication Arrangement; constructing it manually will fail. This is SAP's multi-tenant isolation pattern.
- Can the integration write data back into SAP?
- No. GL accounts and cost centers in SAP are maintained in SAP only and are published as read-only via OData. ml-connector reads those dimensions and pushes them to Databricks for analytics and joins, but never writes back into SAP. Data flows one direction: S/4HANA to Databricks.
Related integrations
More SAP S/4HANA integrations
Other systems that connect to Databricks
Connect SAP S/4HANA and Databricks
Free to use. Add your credentials, ping your real systems, and see if we fit.
Get started