SAP ECC and Google BigQuery integration
SAP ECC holds your operational and financial master data and transactional records. Google BigQuery stores structured data at scale for analytics, reporting, and ML. Connecting SAP ECC to BigQuery lets your analytics team build reports on live vendor and customer data, invoice status, purchase order trends, and GL account activity without copying data manually or writing RFC code. ml-connector extracts on a schedule and lands every record in BigQuery with a full audit trail so you can trace any number back to SAP.
What moves between them
The integration runs one-way from SAP ECC into BigQuery. ml-connector extracts vendors, customers, purchase orders, invoice documents, GL posting headers and line items, and cost centers from SAP ECC on a schedule you configure - typically daily or weekly depending on your analytics refresh cadence. Records are inserted into BigQuery datasets as partitioned tables by document date or load timestamp. Because SAP ECC is pull-only and BigQuery has no push capability, there is no data flow back into SAP. Change detection in BigQuery uses partition pruning and _PARTITIONTIME filtering to load only new or modified records on subsequent runs.
How ml-connector handles it
ml-connector manages the on-premises connectivity by accepting the SAP Gateway or RFC agent endpoint, username, and password encrypted at rest in the cell database. For RFC calls, ml-connector uses the agent runtime to execute BAPI function modules and RFC_READ_TABLE reads; for OData, it sends HTTP Basic Auth headers against the SAP Gateway base URL you provide. On the BigQuery side, ml-connector stores the service account JSON private key encrypted and uses it to sign and exchange a JWT for an access token on each session, refreshing before the 3600-second expiry. To prevent duplicate records on retry, ml-connector tracks extraction timestamps and uses SAP reference document fields (REF_DOC_NO for accounting documents) as deduplication keys. Extraction polls are scheduled via BullMQ cron, and each poll job carries a jobId for idempotency. BigQuery load jobs are submitted asynchronously via the Jobs API and polled for completion, with streaming inserts using insertId for best-effort deduplication. Every record loaded into BigQuery carries a source system timestamp, load timestamp, and audit trail entry showing which run extracted it.
A real-world example
A mid-market manufacturing company runs SAP ECC for procurement, accounting, and materials management across three plants. The finance team needs month-end close metrics on supplier payment timeliness, a purchasing analysis showing PO trends by vendor and plant, and GL account period reconciliation. Historically, they exported SAP reports to spreadsheets and rebuilt them in Excel for analysis. With SAP ECC and BigQuery connected, vendors, customers, POs, and GL entries flow nightly into BigQuery datasets. The analytics team writes SQL queries against those tables to build dashboards on PO aging, supplier invoice trends, and GL balance rollup by cost center. The extraction audit log lets them reconcile any metric back to the original SAP transaction, and they can refresh dashboards without waiting for month-end reporting runs.
What you can do
- Extract vendors, customers, purchase orders, and invoices from SAP ECC via RFC or OData and load them into BigQuery datasets on a schedule.
- Sync GL posting headers and line items so month-end GL balances can be analyzed and reconciled by cost center and account.
- Handle SAP ECC Basic Auth, on-premises agent routing, and RFC gateway complexity, with token and session management.
- Deduplicate on retry using SAP reference document fields and BigQuery insertId, preventing duplicate records when extractions are replayed.
- Partition tables by document date or load timestamp so change detection queries use partition pruning and load only new or modified records.
Questions
- How does ml-connector reach SAP ECC from the cloud when it is on-premises?
- ml-connector does not call SAP ECC directly from the cloud. Instead, you run an on-premises agent (SAP .NET Connector or Java Connector) on a server on your network with line-of-sight to the SAP ECC system. ml-connector submits RFC and OData requests to the agent, which executes them against SAP and returns the results. For OData calls, you provide the SAP Gateway HTTP hostname instead, and ml-connector sends Basic Auth over HTTPS to that hostname.
- What happens if an SAP extraction fails partway through?
- ml-connector assigns each extraction run a jobId so it can be retried idempotently. If a run fails, ml-connector can re-submit the same extraction on the next scheduled poll. SAP documents carry reference fields (REF_DOC_NO for accounting entries, PO number for orders) that ml-connector uses to detect and skip records already loaded in BigQuery, and BigQuery insertId provides a second deduplication layer, so replaying an extraction does not create duplicate rows.
- How often should extractions run and do they interfere with SAP users?
- Extraction frequency depends on your analytics refresh cadence - daily is typical, but weekly is common for GL and cost center data. ml-connector polls using standard RFC_READ_TABLE and OData queries, which SAP treats like normal user requests. Because RFC throughput is limited (10 to 50 concurrent calls safely), ml-connector spaces out large extractions and respects SAP Basis recommendations to avoid SYSTEM_FAILURE errors. High-volume extractions at peak SAP usage can be scheduled off-hours.
Related integrations
More SAP ECC integrations
Other systems that connect to Google BigQuery
Connect SAP ECC and Google BigQuery
Free to use. Add your credentials, ping your real systems, and see if we fit.
Get started