What is the on-premises agent requirement for SAP ECC polling?

SAP ECC RFC/BAPI calls cannot originate from the cloud, so you must run an on-premises agent on your network using SAP NCo (. NET Connector) or JCo (Java Connector). ml-connector connects to this agent via a secure gateway you configure with HTTP Basic Auth credentials. The agent runs on Windows or Linux and initiates all RFC requests on your behalf.

How often can ml-connector poll SAP ECC without overloading it?

SAP ECC typically handles 10 to 50 concurrent RFC calls safely, depending on your system load and configuration. ml-connector is designed to poll sequentially or with limited parallelism (e.g., one entity at a time) to avoid hitting RFC_READ_TABLE bottlenecks. You control the schedule: daily, weekly, or after every payroll or close cycle.

What happens if a Databricks table schema does not match the incoming SAP data?

ml-connector auto-creates Databricks tables with inferred schemas on the first run, using standard data types (STRING for text, BIGINT for numbers, TIMESTAMP for dates). If a new SAP field is added later, ml-connector detects the schema mismatch and either adds the column or alerts you to the change. You can also pre-define table schemas in Databricks and have ml-connector validate incoming records against them.

SAP ECCDatabricks

SAP ECC and Databricks integration

SAP ECC stores your financial master data and operational records. Databricks is your analytics and data intelligence platform. Connecting the two moves SAP's vendors, general ledger accounts, cost centers, employee rosters, and materials into Databricks tables so your finance analytics team can build reports and dashboards without manual exports. New vendors, cost centers, or GL postings in SAP appear in Databricks within hours.

How SAP ECC works

SAP ECC exposes data through RFC/BAPI function modules, OData v2 REST services, and SOAP web services. An on-premises agent running SAP NCo (. NET Connector) or JCo (Java Connector) is required because RFC calls cannot originate from the cloud. Authentication uses HTTP Basic Auth for OData and IDoc, and RFC Basic Auth for BAPI calls. Key entities include vendors (LFA1/LFB1), general ledger accounts and balances (BKPF/BSEG), cost centers, employees, materials, and purchase orders. SAP ECC has no native webhook system, so data is read by polling RFC_READ_TABLE on demand or on a scheduled interval, with optional outbound IDoc push if the customer has configured WE21/WE20 in SAP Basis.

How Databricks works

Databricks provides REST APIs at workspace-specific URLs (https://<workspace-id>.cloud.databricks.com for AWS, https://<workspace-name>.azuredatabricks.net for Azure, https://<workspace-id>.gcp.databricks.com for GCP). Authentication uses OAuth 2.0 Client Credentials (Service Principal with client_id and client_secret), with bearer tokens expiring every 3600 seconds. Databricks is a data platform with no native finance or ERP objects; it provides compute clusters, SQL warehouses, and Unity Catalog (data governance) where tables and schemas hold the incoming data. Webhooks are available only for MLflow Model Registry, not for data events, so polling is required for compute and table operations. Table writes via REST are metadata-only; actual data is persisted via SQL or Spark.

What moves between them

The main flow runs from SAP ECC into Databricks. ml-connector polls SAP on a schedule tied to your batch windows (typically nightly or weekly) and reads vendors, cost centers, employees, materials, and general ledger accounts and their balances. These records are then written as tables into a Databricks schema you designate, organized by entity type (e.g., sap_vendors, sap_costcenters, sap_employees, sap_glaccounts). The tables append new rows on each run and can be configured to upsert (update-or-insert) based on a key field like vendor number or cost center ID. Historical tracking is maintained in Databricks for audit and trend analysis. The flow is read-only from SAP; no updates flow back into SAP ECC from Databricks.

How ml-connector handles it

ml-connector runs an on-premises SAP gateway that initiates RFC_READ_TABLE calls to SAP ECC on your behalf, using the HTTP Basic Auth credentials and on-premises agent you provide. For each entity (vendors, cost centers, employees, etc.), ml-connector requests the relevant BAPI or RFC function, handles pagination for large result sets, and performs character encoding normalization to prevent garbled text. On the Databricks side, ml-connector refreshes the OAuth 2.0 bearer token every 50 minutes (well before the 60-minute expiry) and writes records as an INSERT or UPSERT into Unity Catalog tables. Row-level transformations are applied (e.g., mapping SAP's internal cost center codes to readable department names), and every record carries the timestamp of the SAP read and the Databricks write. If a network failure or timeout occurs during the poll, ml-connector backs off exponentially and retries up to a configured limit. Failed records are logged to Databricks error tables and can be manually replayed once the issue is resolved. Because SAP ECC is on-premises and has no webhooks, polling is scheduled at a cadence appropriate to your finance closing cycle (weekly for month-end preparation, nightly for daily reporting).

A real-world example

A mid-sized manufacturing company runs SAP ECC on-premises for procurement, production, and finance. Their finance analytics team uses Databricks for reporting and cost analysis but has been manually exporting vendor lists, GL account balances, and cost center assignments from SAP every week, then uploading them to Databricks and re-checking for changes and errors. With SAP ECC and Databricks connected, each scheduled poll fetches the latest vendors, cost centers, employees, and GL balances from SAP and writes them to Databricks tables automatically. The analytics team can now build reports that always reflect current SAP data, drill into cost center performance by employee assignment, and reconcile GL postings without re-export cycles. Vendor onboarding also shows up in Databricks within hours, so procurement dashboards reflect the full supplier base in real time.

What you can do

Poll vendors, cost centers, employees, materials, and general ledger accounts from SAP ECC on a schedule and write them to Databricks Unity Catalog tables.
Handle RFC/BAPI authentication via on-premises SAP gateway and Databricks OAuth 2.0 Client Credentials with automatic token refresh every 50 minutes.
Apply row-level transformations (e.g., cost center code to department name mapping) and maintain historical records with read and write timestamps in Databricks.
Detect and recover from RFC timeouts and Databricks network failures with exponential backoff retry and full error logging to separate error tables.
Organize extracted entities into separate Databricks tables (sap_vendors, sap_costcenters, sap_employees, sap_glaccounts, sap_materials) with upsert-on-key (e.g., vendor number or cost center ID) for idempotent updates.

Questions

What is the on-premises agent requirement for SAP ECC polling?: SAP ECC RFC/BAPI calls cannot originate from the cloud, so you must run an on-premises agent on your network using SAP NCo (. NET Connector) or JCo (Java Connector). ml-connector connects to this agent via a secure gateway you configure with HTTP Basic Auth credentials. The agent runs on Windows or Linux and initiates all RFC requests on your behalf.
How often can ml-connector poll SAP ECC without overloading it?: SAP ECC typically handles 10 to 50 concurrent RFC calls safely, depending on your system load and configuration. ml-connector is designed to poll sequentially or with limited parallelism (e.g., one entity at a time) to avoid hitting RFC_READ_TABLE bottlenecks. You control the schedule: daily, weekly, or after every payroll or close cycle.
What happens if a Databricks table schema does not match the incoming SAP data?: ml-connector auto-creates Databricks tables with inferred schemas on the first run, using standard data types (STRING for text, BIGINT for numbers, TIMESTAMP for dates). If a new SAP field is added later, ml-connector detects the schema mismatch and either adds the column or alerts you to the change. You can also pre-define table schemas in Databricks and have ml-connector validate incoming records against them.

Connect SAP ECC and Databricks

Free to use. Add your credentials, ping your real systems, and see if we fit.

Get started