SAP ECC and Databricks integration
SAP ECC stores your financial master data and operational records. Databricks is your analytics and data intelligence platform. Connecting the two moves SAP's vendors, general ledger accounts, cost centers, employee rosters, and materials into Databricks tables so your finance analytics team can build reports and dashboards without manual exports. New vendors, cost centers, or GL postings in SAP appear in Databricks within hours.
What moves between them
The main flow runs from SAP ECC into Databricks. ml-connector polls SAP on a schedule tied to your batch windows (typically nightly or weekly) and reads vendors, cost centers, employees, materials, and general ledger accounts and their balances. These records are then written as tables into a Databricks schema you designate, organized by entity type (e.g., sap_vendors, sap_costcenters, sap_employees, sap_glaccounts). The tables append new rows on each run and can be configured to upsert (update-or-insert) based on a key field like vendor number or cost center ID. Historical tracking is maintained in Databricks for audit and trend analysis. The flow is read-only from SAP; no updates flow back into SAP ECC from Databricks.
How ml-connector handles it
ml-connector runs an on-premises SAP gateway that initiates RFC_READ_TABLE calls to SAP ECC on your behalf, using the HTTP Basic Auth credentials and on-premises agent you provide. For each entity (vendors, cost centers, employees, etc.), ml-connector requests the relevant BAPI or RFC function, handles pagination for large result sets, and performs character encoding normalization to prevent garbled text. On the Databricks side, ml-connector refreshes the OAuth 2.0 bearer token every 50 minutes (well before the 60-minute expiry) and writes records as an INSERT or UPSERT into Unity Catalog tables. Row-level transformations are applied (e.g., mapping SAP's internal cost center codes to readable department names), and every record carries the timestamp of the SAP read and the Databricks write. If a network failure or timeout occurs during the poll, ml-connector backs off exponentially and retries up to a configured limit. Failed records are logged to Databricks error tables and can be manually replayed once the issue is resolved. Because SAP ECC is on-premises and has no webhooks, polling is scheduled at a cadence appropriate to your finance closing cycle (weekly for month-end preparation, nightly for daily reporting).
A real-world example
A mid-sized manufacturing company runs SAP ECC on-premises for procurement, production, and finance. Their finance analytics team uses Databricks for reporting and cost analysis but has been manually exporting vendor lists, GL account balances, and cost center assignments from SAP every week, then uploading them to Databricks and re-checking for changes and errors. With SAP ECC and Databricks connected, each scheduled poll fetches the latest vendors, cost centers, employees, and GL balances from SAP and writes them to Databricks tables automatically. The analytics team can now build reports that always reflect current SAP data, drill into cost center performance by employee assignment, and reconcile GL postings without re-export cycles. Vendor onboarding also shows up in Databricks within hours, so procurement dashboards reflect the full supplier base in real time.
What you can do
- Poll vendors, cost centers, employees, materials, and general ledger accounts from SAP ECC on a schedule and write them to Databricks Unity Catalog tables.
- Handle RFC/BAPI authentication via on-premises SAP gateway and Databricks OAuth 2.0 Client Credentials with automatic token refresh every 50 minutes.
- Apply row-level transformations (e.g., cost center code to department name mapping) and maintain historical records with read and write timestamps in Databricks.
- Detect and recover from RFC timeouts and Databricks network failures with exponential backoff retry and full error logging to separate error tables.
- Organize extracted entities into separate Databricks tables (sap_vendors, sap_costcenters, sap_employees, sap_glaccounts, sap_materials) with upsert-on-key (e.g., vendor number or cost center ID) for idempotent updates.
Questions
- What is the on-premises agent requirement for SAP ECC polling?
- SAP ECC RFC/BAPI calls cannot originate from the cloud, so you must run an on-premises agent on your network using SAP NCo (. NET Connector) or JCo (Java Connector). ml-connector connects to this agent via a secure gateway you configure with HTTP Basic Auth credentials. The agent runs on Windows or Linux and initiates all RFC requests on your behalf.
- How often can ml-connector poll SAP ECC without overloading it?
- SAP ECC typically handles 10 to 50 concurrent RFC calls safely, depending on your system load and configuration. ml-connector is designed to poll sequentially or with limited parallelism (e.g., one entity at a time) to avoid hitting RFC_READ_TABLE bottlenecks. You control the schedule: daily, weekly, or after every payroll or close cycle.
- What happens if a Databricks table schema does not match the incoming SAP data?
- ml-connector auto-creates Databricks tables with inferred schemas on the first run, using standard data types (STRING for text, BIGINT for numbers, TIMESTAMP for dates). If a new SAP field is added later, ml-connector detects the schema mismatch and either adds the column or alerts you to the change. You can also pre-define table schemas in Databricks and have ml-connector validate incoming records against them.
Related integrations
More SAP ECC integrations
Other systems that connect to Databricks
Connect SAP ECC and Databricks
Free to use. Add your credentials, ping your real systems, and see if we fit.
Get started