Which direction does data move between DATEV and Google BigQuery?

The main flow is Google BigQuery into DATEV. Finalized booking rows are queried from customer-defined BigQuery tables and submitted into DATEV as EXTF or DXSO file jobs. DATEV finalized bookings are write-only and its chart of accounts is not readable through the API, so ml-connector does not read posted journal entries back into BigQuery.

Does Google BigQuery hold invoices or GL accounts that can post to DATEV?

BigQuery is a schema-flexible warehouse, not an ERP, so it has no native invoice or GL account objects. The customer defines tables such as gl_entries, invoices, and vendors inside a dataset, and ml-connector queries those rows and maps their columns to valid DATEV accounts and cost centers when it builds the booking file. The connector accepts a configurable dataset and table per record type.

How does the integration handle that neither system supports webhooks?

Both DATEV and Google BigQuery are pull-only. ml-connector queries BigQuery on a schedule with a timestamp filter so it only picks up new rows, submits the EXTF or DXSO file to DATEV, receives a job ID, and polls the DATEV job status endpoint with exponential backoff until it reports complete. BigQuery query jobs are themselves asynchronous, so the connector also polls each query job to DONE before reading its results.

DATEVGoogle BigQuery

DATEV and Google BigQuery integration

DATEV runs accounting and tax filing for German businesses. Google BigQuery is the serverless warehouse where finance data is assembled, modeled, and queried at scale. Connecting the two lets finalized booking rows that already live in BigQuery flow into DATEV as bookings without re-keying. ml-connector queries the agreed rows from BigQuery, formats them as DATEV EXTF CSV or DXSO XML files, and submits them into DATEV Rechnungswesen or DATEV Unternehmen Online on a schedule you control. Because both systems are pull-only, the connector drives every step by polling rather than waiting for a push.

How DATEV works

DATEV is not a conventional REST API. A REST surface exists for client lookup and document upload, but finalized bookings go through asynchronous file jobs. You submit an EXTF CSV file to DATEV Rechnungswesen or a DXSO XML file to DATEV Unternehmen Online, receive a job ID, then poll the job status endpoint until it completes, since there are no webhooks. Authentication is OAuth 2.0 Authorization Code with PKCE against the DATEV login provider, with a state value of at least 20 characters and 15-minute access tokens that require refresh. The standard DATEV chart of accounts cannot be read through the API, and finalized bookings are write-only, so posted journal entries cannot be read back.

How Google BigQuery works

Google BigQuery is a serverless data warehouse exposed through the BigQuery REST API v2, JSON over HTTPS, with every call carrying an OAuth 2.0 bearer token. It is schema-flexible rather than an ERP, so it has no native invoice, vendor, or GL account objects; the customer defines tables such as gl_entries, invoices, and vendors inside a dataset. Reads run as asynchronous query jobs that are submitted, polled until the job state is DONE, then paged through with a pageToken. Authentication is a Google service account that signs a JWT and exchanges it for a one-hour access token. BigQuery sends no webhooks, so it is a pull-only source that the connector queries on a schedule.

What moves between them

The main flow runs from Google BigQuery into DATEV. On each scheduled run, ml-connector queries the customer-defined BigQuery tables that hold finalized booking data, such as gl_entries, invoices, and payments, filtering on a timestamp column so only new rows since the last run are pulled. It writes those rows into DATEV as an EXTF booking batch for Rechnungswesen or a DXSO booking suggestion for Unternehmen Online, and uploads any linked invoice PDFs to DATEV Unternehmen Online. Reference data such as accounts and cost centers is aligned so each booking line lands on a valid DATEV dimension. Cadence follows your close process, from a nightly load to month-end. DATEV finalized bookings are write-only and its chart of accounts is not readable, so ml-connector does not read posted journals back into BigQuery.

How ml-connector handles it

ml-connector stores both credential sets encrypted. For DATEV it completes the OAuth 2.0 Authorization Code with PKCE login, refreshes the 15-minute access token with the client ID only, and sends the X-DATEV-Client-Id header on every call. For BigQuery it signs a JWT with the service account private key and exchanges it for an access token that it refreshes before the 3600 second expiry, preserving the literal newlines in the PEM key so signing does not break. It maps BigQuery columns to DATEV accounts and cost centers up front, because DATEV does not expose its chart of accounts and the EXTF format demands valid account numbers. Reads run as asynchronous query jobs with a caller-supplied jobId for idempotency, polled until DONE, then paged with a pageToken, and a timestamp filter on a partitioned column keeps each query cheap. EXTF files are written as UTF-8 with precomposed characters and deterministic filenames so DATEV duplicate detection allows safe retries, and document uploads use the GUID-based PUT so they are idempotent. Because neither side pushes events and the DATEV import is asynchronous, the connector polls each DATEV job to completion with exponential backoff. Every record carries a full audit trail and can be replayed if a step fails.

A real-world example

A mid-sized German services group with around 250 staff books its statutory accounts through a DATEV tax advisor and runs its finance reporting in Google BigQuery, where subsidiary ledgers and intercompany allocations are consolidated. Before the integration, an analyst exported the consolidated journal from BigQuery to a spreadsheet each month and a bookkeeper keyed the accruals and allocations into DATEV by hand, reconciling cost center codes line by line. With DATEV and Google BigQuery connected, the consolidated booking rows query out of BigQuery and post into DATEV as a booking batch allocated to the right accounts and cost centers, so the ledger reflects the warehouse figures without re-keying. The manual export-and-type step disappears and the reporting numbers and the books stay in step.

What you can do

Query finalized booking rows from customer-defined Google BigQuery tables and submit them into DATEV as EXTF or DXSO file jobs.
Map Google BigQuery columns to DATEV accounts and cost centers, since DATEV does not expose its chart of accounts through the API.
Bridge the DATEV OAuth 2.0 PKCE login session and the Google BigQuery service account JWT, refreshing each token before it expires.
Read BigQuery through asynchronous query jobs with a timestamp filter, then poll the DATEV import job to completion, since neither system supports webhooks.
Write retry-safe EXTF files with deterministic filenames and use a stable BigQuery jobId, with a full audit trail on every record.

Questions

Which direction does data move between DATEV and Google BigQuery?: The main flow is Google BigQuery into DATEV. Finalized booking rows are queried from customer-defined BigQuery tables and submitted into DATEV as EXTF or DXSO file jobs. DATEV finalized bookings are write-only and its chart of accounts is not readable through the API, so ml-connector does not read posted journal entries back into BigQuery.
Does Google BigQuery hold invoices or GL accounts that can post to DATEV?: BigQuery is a schema-flexible warehouse, not an ERP, so it has no native invoice or GL account objects. The customer defines tables such as gl_entries, invoices, and vendors inside a dataset, and ml-connector queries those rows and maps their columns to valid DATEV accounts and cost centers when it builds the booking file. The connector accepts a configurable dataset and table per record type.
How does the integration handle that neither system supports webhooks?: Both DATEV and Google BigQuery are pull-only. ml-connector queries BigQuery on a schedule with a timestamp filter so it only picks up new rows, submits the EXTF or DXSO file to DATEV, receives a job ID, and polls the DATEV job status endpoint with exponential backoff until it reports complete. BigQuery query jobs are themselves asynchronous, so the connector also polls each query job to DONE before reading its results.

Connect DATEV and Google BigQuery

Free to use. Add your credentials, ping your real systems, and see if we fit.

Get started