ml-connector
SAP S/4HANAGoogle BigQuery

SAP S/4HANA and Google BigQuery integration

SAP S/4HANA runs finance, procurement, and accounting as the system of record. Google BigQuery is a serverless data warehouse for analytics and reporting. Connecting the two means supplier invoices, purchase orders, journal entry line items, GL accounts, and cost centers flow out of SAP and into BigQuery tables on a schedule, ready for SQL queries and dashboards without manual exports. ml-connector handles the very different APIs on each side and keeps the warehouse current. Because BigQuery is a data store with no AP or PO objects of its own, the flow is one direction: SAP data into BigQuery.

How SAP S/4HANA works

SAP S/4HANA Cloud exposes its finance and procurement data through OData services on a tenant-specific URL of the form https://<tenant-id>-api.s4hana.ondemand.com. Key entities include business partners (suppliers and customers), supplier invoices, purchase orders, GL accounts, GL account line items, journal entries, and cost centers, each served by a named OData V2 service. Authentication is OAuth 2.0 client credentials, and an SAP admin must first create a Communication Arrangement before any service responds. SAP Cloud does not push HTTP webhooks natively, so records are read by polling with a $filter on LastChangeDateTime or PostingDate; event push is only available through SAP Event Mesh on BTP.

How Google BigQuery works

Google BigQuery exposes its data through the BigQuery REST API v2 at https://bigquery.googleapis.com/bigquery/v2, with paths scoped by project, dataset, and table. BigQuery has no predefined invoice or vendor objects, so the connector writes into customer-defined tables using the streaming insert endpoint (tabledata.insertAll) or batch load jobs (jobs.insert), and reads back through asynchronous query jobs that are polled until done. Authentication is a Google service account with a JSON key, signing a JWT exchanged for a one-hour bearer token. BigQuery sends no webhooks of any kind; it is a pull-only, push-in store, so any change detection is done by querying a timestamp column on a schedule.

What moves between them

The flow runs one direction, from SAP S/4HANA into Google BigQuery. ml-connector reads supplier invoices, purchase orders, journal entry line items, and master data such as business partners, GL accounts, and cost centers from the SAP OData services, then loads them as rows into the matching BigQuery tables the customer has defined. Each SAP entity maps to a configurable dataset and table, so invoices land in an invoices table and POs in a purchase_orders table. The cadence is set by you, typically running often enough to keep dashboards current, with each pass reading only records changed since the last run. BigQuery has no source records to send back, so the connector never writes financial data from the warehouse into SAP.

How ml-connector handles it

ml-connector stores both credential sets encrypted. On the SAP side it requests an OAuth 2.0 client-credentials token from the tenant token URL, caches it, and refreshes before its roughly twelve-hour expiry, sending it as a Bearer token on every OData call against the tenant-specific base URL. On the Google side it signs a JWT with the service account private key and exchanges it for a one-hour access token, refreshing before expiry, and it preserves the literal newlines in the PEM key so JWT signing does not break. Because SAP Cloud is pull-only, it polls each OData service on your schedule using a $filter on LastChangeDateTime or PostingDate rather than scanning whole tables, and follows the @odata.nextLink cursor through pages. Rows are written to BigQuery with tabledata.insertAll for steady streams or batched into a load job for larger pulls. SAP returns HTTP 429 with a Retry-After header under load and BigQuery enforces per-project quotas, so the connector backs off with jitter and retries. Each row carries an insertId derived from the SAP document key so a re-read inside the dedup window does not double-load, and load jobs use a caller-supplied jobReference.jobId for idempotency. One gotcha is that SAP reads here need no X-CSRF-Token, but the token is fetched automatically if any write-back path is ever enabled. Every record carries a full audit trail and can be replayed if a load fails.

A real-world example

A mid-sized manufacturer with roughly 800 employees runs SAP S/4HANA Cloud for finance and procurement, and the finance and analytics teams want spend and AP reporting that SAP's standard reports do not cover. Before the integration, an analyst exported supplier invoices and purchase orders to spreadsheets each week and loaded them into BigQuery by hand, which meant dashboards were always a few days stale and a missed export left gaps. With SAP S/4HANA and Google BigQuery connected, invoices, POs, journal lines, and cost center master data load into BigQuery on a schedule, filtered to only what changed, so the spend dashboards refresh on their own. The analyst stops doing manual exports, and the reporting layer stays in step with the ledger.

What you can do

  • Load SAP S/4HANA supplier invoices, purchase orders, and journal entry line items into Google BigQuery tables on a schedule.
  • Sync SAP master data such as business partners, GL accounts, and cost centers into BigQuery for reporting joins.
  • Read only records changed since the last run using an OData $filter on LastChangeDateTime, then follow the nextLink cursor.
  • Authenticate SAP with OAuth 2.0 client credentials and BigQuery with a Google service account JWT, refreshing both tokens before expiry.
  • Dedupe each row with an insertId from the SAP document key, with quota-aware backoff and a full audit trail on every record.

Questions

Which direction does data move between SAP S/4HANA and Google BigQuery?
The flow is one direction, from SAP S/4HANA into Google BigQuery. Supplier invoices, purchase orders, journal entry line items, and master data move out of SAP and into BigQuery tables for analytics. BigQuery is a data warehouse with no AP or PO objects of its own, so ml-connector does not write financial records from the warehouse back into SAP.
Does BigQuery send events, or does ml-connector poll SAP for changes?
Neither system pushes events in this setup, so ml-connector polls. SAP S/4HANA Cloud has no native HTTP webhooks and BigQuery sends none at all. The connector reads each SAP OData service on a schedule you control, using a $filter on LastChangeDateTime or PostingDate so it only pulls records that changed since the last run.
How does the integration avoid loading the same SAP record twice into BigQuery?
Each row is written with an insertId derived from the SAP document key, which gives BigQuery best-effort deduplication within its roughly one-minute streaming window. For larger pulls done as load jobs, ml-connector supplies a stable jobReference.jobId so resubmitting the same batch returns the existing job rather than creating a duplicate. Every record also carries an audit trail and can be replayed if a load fails.

Related integrations

Connect SAP S/4HANA and Google BigQuery

Free to use. Add your credentials, ping your real systems, and see if we fit.

Get started