ml-connector
DATEVSnowflake

DATEV and Snowflake integration

DATEV runs accounting and tax filing for German businesses. Snowflake is the cloud warehouse where finance data is assembled, modeled, and queried at scale. Connecting the two lets finalized booking rows that already live in Snowflake flow into DATEV as bookings without re-keying. ml-connector queries the agreed rows from Snowflake, formats them as DATEV EXTF CSV or DXSO XML files, and submits them into DATEV Rechnungswesen or DATEV Unternehmen Online on a schedule you control. Because both systems are pull-only, the connector drives every step by polling rather than waiting for a push.

How DATEV works

DATEV is not a conventional REST API. A REST surface exists for client lookup and invoice document upload to DATEV Unternehmen Online, but finalized bookings go through asynchronous file jobs. You submit an EXTF CSV file to DATEV Rechnungswesen or a DXSO XML file to DATEV Unternehmen Online, receive a job ID, then poll the job status endpoint until it completes, since there are no webhooks. Authentication is OAuth 2.0 Authorization Code with PKCE against login.datev.de, requiring an interactive user sign-in, a code_challenge_method of S256, a state value of at least 20 characters, and 15-minute access tokens that require refresh. The standard DATEV chart of accounts cannot be read through the API, and finalized bookings are write-only, so posted journal entries cannot be read back.

How Snowflake works

Snowflake is a cloud data warehouse, not an ERP, so it has no native invoice, vendor, purchase order, or GL account objects; the customer defines tables such as gl_entries, invoices, and vendors. Every read and write runs through the SQL API at POST /api/v2/statements, with results paged through partition-based fetches rather than offset and limit, and an optional requestId UUID that makes a retried call idempotent. The recommended server-to-server auth is key pair, where a service user signs a short-lived RS256 JWT carrying the X-Snowflake-Authorization-Token-Type KEYPAIR_JWT header. Snowflake does not send webhooks to connector endpoints, so it is a pull-only source detected by a timestamp watermark or a Stream.

What moves between them

The main flow runs from Snowflake into DATEV. On each scheduled run, ml-connector queries the customer-defined Snowflake tables that hold finalized booking data, such as gl_entries, invoices, and payments, filtering on an updated_at column or reading a Stream so only new rows since the last run are pulled. It writes those rows into DATEV as an EXTF booking batch for Rechnungswesen or a DXSO booking suggestion for Unternehmen Online, and uploads any linked invoice PDFs to DATEV Unternehmen Online. Reference data such as accounts and cost centers is aligned so each booking line lands on a valid DATEV dimension. Cadence follows your close process, from a nightly load to month-end. DATEV finalized bookings are write-only and its chart of accounts is not readable, so ml-connector does not read posted journals back into Snowflake.

How ml-connector handles it

ml-connector stores both credential sets encrypted. For DATEV it completes the OAuth 2.0 Authorization Code with PKCE login, refreshes the 15-minute access token by sending the client_id only and never the client_secret, and sends the X-DATEV-Client-Id header on every call. For Snowflake it signs an RS256 JWT with the service user RSA private key and sets the KEYPAIR_JWT token-type header, regenerating the JWT each cycle because it expires within an hour, and it preserves the literal newlines in the PEM key so signing does not break. It maps Snowflake columns to DATEV accounts and cost centers up front, because DATEV does not expose its chart of accounts and the EXTF format demands valid account numbers. Reads run through POST /api/v2/statements with a requestId UUID for idempotency, then page through additional partitions, and a watermark filter on updated_at or a Stream keeps each query to new rows; line items stored in a VARIANT column are flattened during the read. EXTF files are written as UTF-8 with precomposed characters and deterministic filenames so DATEV duplicate detection allows safe retries, and document uploads use the GUID-based PUT so they are idempotent. Because neither side pushes events and the DATEV import is asynchronous, the connector polls each DATEV job to completion with exponential backoff and jitter starting near five seconds. Known edge cases are handled: the Snowflake virtual warehouse must have AUTO_RESUME on or queries fail while it is suspended, and a service user needs a network policy that allows the connector egress IPs. Every record carries a full audit trail and can be replayed if a step fails.

A real-world example

A mid-sized German services group with around 250 staff books its statutory accounts through a DATEV tax advisor and runs its finance reporting in Snowflake, where subsidiary ledgers and intercompany allocations are consolidated. Before the integration, an analyst exported the consolidated journal from Snowflake to a spreadsheet each month and a bookkeeper keyed the accruals and allocations into DATEV by hand, reconciling cost center codes line by line. With DATEV and Snowflake connected, the consolidated booking rows query out of Snowflake and post into DATEV as a booking batch allocated to the right accounts and cost centers, so the ledger reflects the warehouse figures without re-keying. The manual export-and-type step disappears and the reporting numbers and the books stay in step.

What you can do

  • Query finalized booking rows from customer-defined Snowflake tables and submit them into DATEV as EXTF or DXSO file jobs.
  • Map Snowflake columns to DATEV accounts and cost centers, since DATEV does not expose its chart of accounts through the API.
  • Bridge the DATEV OAuth 2.0 PKCE login session and the Snowflake service user key-pair JWT, refreshing each credential before it expires.
  • Read Snowflake through the SQL API with a watermark or a Stream, then poll the DATEV import job to completion, since neither system supports webhooks.
  • Write retry-safe EXTF files with deterministic filenames and use a Snowflake requestId for idempotency, with a full audit trail on every record.

Questions

Which direction does data move between DATEV and Snowflake?
The main flow is Snowflake into DATEV. Finalized booking rows are queried from customer-defined Snowflake tables and submitted into DATEV as EXTF or DXSO file jobs. DATEV finalized bookings are write-only and its chart of accounts is not readable through the API, so ml-connector does not read posted journal entries back into Snowflake.
Does Snowflake hold invoices or GL accounts that can post to DATEV?
Snowflake is a data warehouse, not an ERP, so it has no native invoice or GL account objects. The customer defines tables such as gl_entries, invoices, and vendors, and ml-connector queries those rows through the SQL API and maps their columns to valid DATEV accounts and cost centers when it builds the booking file. Line items stored in a VARIANT column are flattened during the read.
How does the integration handle that neither system supports webhooks?
Both DATEV and Snowflake are pull-only. ml-connector queries Snowflake on a schedule with an updated_at watermark or a Stream so it only picks up new rows, submits the EXTF or DXSO file to DATEV, receives a job ID, and polls the DATEV job status endpoint with exponential backoff until it reports complete. Snowflake queries can also run asynchronously, so the connector polls the statement handle to completion before reading the result.

Related integrations

Connect DATEV and Snowflake

Free to use. Add your credentials, ping your real systems, and see if we fit.

Get started