How does ml-connector reach SAP ECC from the cloud when it is on-premises?

ml-connector does not call SAP ECC directly from the cloud. Instead, you run an on-premises agent (SAP .NET Connector or Java Connector) on a server on your network with line-of-sight to the SAP ECC system. ml-connector submits RFC and OData requests to the agent, which executes them against SAP and returns the results. For OData calls, you provide the SAP Gateway HTTP hostname instead, and ml-connector sends Basic Auth over HTTPS to that hostname.

What happens if an SAP extraction fails partway through?

ml-connector assigns each extraction run a jobId so it can be retried idempotently. If a run fails, ml-connector can re-submit the same extraction on the next scheduled poll. SAP documents carry reference fields (REF_DOC_NO for accounting entries, PO number for orders) that ml-connector uses to detect and skip records already loaded in BigQuery, and BigQuery insertId provides a second deduplication layer, so replaying an extraction does not create duplicate rows.

How often should extractions run and do they interfere with SAP users?

Extraction frequency depends on your analytics refresh cadence - daily is typical, but weekly is common for GL and cost center data. ml-connector polls using standard RFC_READ_TABLE and OData queries, which SAP treats like normal user requests. Because RFC throughput is limited (10 to 50 concurrent calls safely), ml-connector spaces out large extractions and respects SAP Basis recommendations to avoid SYSTEM_FAILURE errors. High-volume extractions at peak SAP usage can be scheduled off-hours.

SAP ECCGoogle BigQuery

SAP ECC and Google BigQuery integration

SAP ECC holds your operational and financial master data and transactional records. Google BigQuery stores structured data at scale for analytics, reporting, and ML. Connecting SAP ECC to BigQuery lets your analytics team build reports on live vendor and customer data, invoice status, purchase order trends, and GL account activity without copying data manually or writing RFC code. ml-connector extracts on a schedule and lands every record in BigQuery with a full audit trail so you can trace any number back to SAP.

How SAP ECC works

SAP ECC exposes vendors, customers, GL accounts, cost centers, purchase orders, invoices, and goods receipts through RFC/BAPI function modules, OData v2 via SAP Gateway, or SOAP web services. An on-premises agent running SAP .NET Connector or Java Connector is required to execute RFC calls on the customer network; OData calls use HTTP Basic Auth against a SAP Gateway hostname configured by the customer SAP Basis team. SAP ECC has no native webhook system, so records are extracted by polling RFC_READ_TABLE on a schedule, typically 10 to 50 concurrent calls to avoid SYSTEM_FAILURE exceptions. Key extraction concerns include the 512-character row width limit on classic RFC_READ_TABLE, character encoding for non-ASCII data, and ensuring write BAPIs are followed by explicit BAPI_TRANSACTION_COMMIT to prevent locked documents.

How Google BigQuery works

Google BigQuery accepts data as JSON over HTTPS REST via the tabledata.insertAll endpoint, authenticated with OAuth 2.0 service account credentials. Service account private keys sign a JWT that is exchanged for an access token good for 3600 seconds, and tokens must be refreshed before expiry. BigQuery has no outbound webhooks or event push; all data discovery is pull-only via SQL queries on timestamp columns or _PARTITIONTIME partition fields. Streaming inserts use best-effort deduplication with insertId, and the service account requires bigquery.dataEditor and bigquery.jobUser roles minimum. BigQuery tables are customer-defined; there are no pre-built invoice or order schemas.

What moves between them

The integration runs one-way from SAP ECC into BigQuery. ml-connector extracts vendors, customers, purchase orders, invoice documents, GL posting headers and line items, and cost centers from SAP ECC on a schedule you configure - typically daily or weekly depending on your analytics refresh cadence. Records are inserted into BigQuery datasets as partitioned tables by document date or load timestamp. Because SAP ECC is pull-only and BigQuery has no push capability, there is no data flow back into SAP. Change detection in BigQuery uses partition pruning and _PARTITIONTIME filtering to load only new or modified records on subsequent runs.

How ml-connector handles it

ml-connector manages the on-premises connectivity by accepting the SAP Gateway or RFC agent endpoint, username, and password encrypted at rest in the cell database. For RFC calls, ml-connector uses the agent runtime to execute BAPI function modules and RFC_READ_TABLE reads; for OData, it sends HTTP Basic Auth headers against the SAP Gateway base URL you provide. On the BigQuery side, ml-connector stores the service account JSON private key encrypted and uses it to sign and exchange a JWT for an access token on each session, refreshing before the 3600-second expiry. To prevent duplicate records on retry, ml-connector tracks extraction timestamps and uses SAP reference document fields (REF_DOC_NO for accounting documents) as deduplication keys. Extraction polls are scheduled via BullMQ cron, and each poll job carries a jobId for idempotency. BigQuery load jobs are submitted asynchronously via the Jobs API and polled for completion, with streaming inserts using insertId for best-effort deduplication. Every record loaded into BigQuery carries a source system timestamp, load timestamp, and audit trail entry showing which run extracted it.

A real-world example

A mid-market manufacturing company runs SAP ECC for procurement, accounting, and materials management across three plants. The finance team needs month-end close metrics on supplier payment timeliness, a purchasing analysis showing PO trends by vendor and plant, and GL account period reconciliation. Historically, they exported SAP reports to spreadsheets and rebuilt them in Excel for analysis. With SAP ECC and BigQuery connected, vendors, customers, POs, and GL entries flow nightly into BigQuery datasets. The analytics team writes SQL queries against those tables to build dashboards on PO aging, supplier invoice trends, and GL balance rollup by cost center. The extraction audit log lets them reconcile any metric back to the original SAP transaction, and they can refresh dashboards without waiting for month-end reporting runs.

What you can do

Extract vendors, customers, purchase orders, and invoices from SAP ECC via RFC or OData and load them into BigQuery datasets on a schedule.
Sync GL posting headers and line items so month-end GL balances can be analyzed and reconciled by cost center and account.
Handle SAP ECC Basic Auth, on-premises agent routing, and RFC gateway complexity, with token and session management.
Deduplicate on retry using SAP reference document fields and BigQuery insertId, preventing duplicate records when extractions are replayed.
Partition tables by document date or load timestamp so change detection queries use partition pruning and load only new or modified records.

Questions

How does ml-connector reach SAP ECC from the cloud when it is on-premises?: ml-connector does not call SAP ECC directly from the cloud. Instead, you run an on-premises agent (SAP .NET Connector or Java Connector) on a server on your network with line-of-sight to the SAP ECC system. ml-connector submits RFC and OData requests to the agent, which executes them against SAP and returns the results. For OData calls, you provide the SAP Gateway HTTP hostname instead, and ml-connector sends Basic Auth over HTTPS to that hostname.
What happens if an SAP extraction fails partway through?: ml-connector assigns each extraction run a jobId so it can be retried idempotently. If a run fails, ml-connector can re-submit the same extraction on the next scheduled poll. SAP documents carry reference fields (REF_DOC_NO for accounting entries, PO number for orders) that ml-connector uses to detect and skip records already loaded in BigQuery, and BigQuery insertId provides a second deduplication layer, so replaying an extraction does not create duplicate rows.
How often should extractions run and do they interfere with SAP users?: Extraction frequency depends on your analytics refresh cadence - daily is typical, but weekly is common for GL and cost center data. ml-connector polls using standard RFC_READ_TABLE and OData queries, which SAP treats like normal user requests. Because RFC throughput is limited (10 to 50 concurrent calls safely), ml-connector spaces out large extractions and respects SAP Basis recommendations to avoid SYSTEM_FAILURE errors. High-volume extractions at peak SAP usage can be scheduled off-hours.

Connect SAP ECC and Google BigQuery

Free to use. Add your credentials, ping your real systems, and see if we fit.

Get started