ml-connector
SAP S/4HANADatabricks

SAP S/4HANA and Databricks integration

SAP S/4HANA runs procurement, finance, and supply chain. Databricks stores and transforms that data at scale. Connecting them moves supplier invoices, purchase orders, GL accounts, cost centers, and business partner master records from S/4HANA into Databricks tables, where your data team can build analytics, reconciliation reports, and real-time dashboards without waiting for month-end exports. Every record carries a full audit trail and can be replayed if a downstream transformation fails.

How SAP S/4HANA works

SAP S/4HANA exposes business partner, supplier, customer, purchase order, invoice, GL account, cost center, and journal entry data through OData V2 and OData V4 REST APIs published at tenant-specific URLs. Authentication uses OAuth 2.0 client credentials scoped to a Communication Arrangement configured by an SAP admin; token endpoints vary per tenant and must not be constructed manually. S/4HANA produces short-lived tokens, typically 12 hours, so ml-connector must cache and refresh before expiry. Tokens are obtained from the Communication Arrangement OAuth details. S/4HANA has no native webhooks, so ml-connector polls via LastChangeDateTime filters or delta tokens. GL Account and Cost Center APIs are read-only; GL Account Line Items are not optimized for large bulk extracts.

How Databricks works

Databricks is a cloud data platform built on Apache Spark and Delta Lake. It consumes data via REST APIs over HTTPS to workspace-specific URLs and authenticates using OAuth 2.0 client credentials (service principal) with workspace-level or account-level token endpoints. Bearer tokens expire in 3600 seconds and must be refreshed. Databricks has no native finance or ERP objects; it is a data destination only. Table writes via REST are metadata-only; actual data is loaded via SQL or Spark. MLflow webhooks are supported for model registry events, but compute and table events require polling. Account-level APIs require an account_id parameter.

What moves between them

Procurement and finance master data flows one direction: from SAP S/4HANA to Databricks. Supplier invoices, purchase orders, GL account line items, cost centers, business partners, and journal entry references are read from S/4HANA on a schedule tied to your finance close calendar and written as Databricks tables. Reference data such as GL account text and cost center descriptions are replicated so downstream joins and analytics can filter and aggregate by those dimensions. S/4HANA GL accounts and cost centers are read-only on the SAP side, so ml-connector never writes back into SAP's chart of accounts or cost structure.

How ml-connector handles it

ml-connector stores the S/4HANA OAuth client credentials and Databricks service principal secret encrypted at rest. It reads the tenant-specific token endpoint from the Communication Arrangement OAuth details (not constructing it) and refreshes S/4HANA tokens before the 12-hour expiry window. For Databricks, it uses the workspace-specific or account-level token endpoint and handles the 3600-second token refresh. OData filters on LastChangeDateTime ensure incremental syncs without re-reading unchanged records. ml-connector polls both systems on a schedule you define, typically aligned with your daily or weekly close routine. S/4HANA's read-only GL Account Line Items API is not optimized for large bulk extracts, so ml-connector paginates and stages results. Databricks table writes use SQL to ensure data is materialized into the target schema. Every record pulled from S/4HANA carries a change timestamp and source system identifier in Databricks, enabling full audit and replay if a downstream transformation fails.

A real-world example

A global manufacturing company runs SAP S/4HANA across three regions for procurement and finance. The shared services finance team needs to analyze supplier spending and GL accruals by cost center every week but today must export supplier invoice data and GL account balances manually from S/4HANA, load them into spreadsheets, and then copy them into a data warehouse. With S/4HANA and Databricks connected, supplier invoices and GL line items flow automatically into Databricks tables on a weekly schedule aligned with the close calendar. The finance team writes SQL directly against those tables to build dynamic dashboards and exception reports without re-keying, and month-end reconciliation starts with all supplier and GL data already synced and ready to aggregate.

What you can do

  • Replicate supplier invoices, purchase orders, GL account line items, and business partner master records from SAP S/4HANA to Databricks tables on a schedule you control.
  • Handle S/4HANA OAuth 2.0 client credentials scoped to Communication Arrangements, with automatic token refresh before expiry.
  • Map SAP tenant-specific endpoints to Databricks workspace URLs and manage both systems' credential sets encrypted.
  • Poll S/4HANA via LastChangeDateTime filters to sync only new and changed records, avoiding bulk re-extracts.
  • Trace every record with source timestamps and audit metadata so downstream transformations can be replayed if they fail.

Questions

Do I need SAP or Databricks admin setup before the integration starts?
Yes. An SAP admin must create a Communication System, OAuth User, and Communication Arrangement with the correct OAuth 2.0 scopes before ml-connector can authenticate. On the Databricks side, a workspace admin must create a service principal with the appropriate REST API permissions. ml-connector cannot create these; it assumes they already exist.
Why does SAP S/4HANA require a tenant-specific token endpoint?
SAP does not publish a shared token endpoint. Instead, the endpoint is unique to each tenant and is exposed in the Communication Arrangement OAuth details. ml-connector must copy that endpoint from the Communication Arrangement; constructing it manually will fail. This is SAP's multi-tenant isolation pattern.
Can the integration write data back into SAP?
No. GL accounts and cost centers in SAP are maintained in SAP only and are published as read-only via OData. ml-connector reads those dimensions and pushes them to Databricks for analytics and joins, but never writes back into SAP. Data flows one direction: S/4HANA to Databricks.

Related integrations

Connect SAP S/4HANA and Databricks

Free to use. Add your credentials, ping your real systems, and see if we fit.

Get started