Which direction does data move between DATEV and Databricks?

Data moves one way, from DATEV into Databricks. ml-connector reads DATEV client metadata, uploaded documents, and submitted booking batches, then loads them into Delta Lake tables. Nothing is written back to DATEV, because finalized bookings there are write-only and Databricks holds no finance records to return.

Can ml-connector read posted journal entries and the full chart of accounts from DATEV?

No. DATEV bookings submitted as EXTF or DXSO files are write-only and cannot be read back through the API, and the standard DATEV chart of accounts is not exposed programmatically. The connector loads the readable surface, which is client metadata, uploaded invoice and receipt documents, and the booking batch data you submit, while GL account labels are supplied or configured in advance.

How is data actually written into Databricks tables?

Databricks REST calls manage table metadata, not row data, so ml-connector writes records through Databricks Jobs and the SQL Statement Execution API against a running SQL warehouse. Loads carry an idempotency_token so a retried run reuses the existing run instead of duplicating rows. Unity Catalog must be enabled on the target workspace for the catalog and schema to receive the data.

DATEVDatabricks

DATEV and Databricks integration

DATEV is the accounting and tax backend for German businesses. Databricks is a cloud data platform built on Delta Lake. This connection moves the finance data that passes through DATEV into Databricks for analytics and reconciliation. ml-connector reads DATEV client metadata, the invoice documents uploaded to DATEV Unternehmen Online, and the EXTF and DXSO booking batches you submit, then writes each record into Delta Lake tables. Because DATEV sends no webhooks and Databricks has no native finance objects, the work is staged: poll DATEV, normalize, and load into the warehouse on a schedule you control.

How DATEV works

DATEV is on-premise accounting software with a cloud document layer, and it does not behave like a conventional REST API. The accounting:clients and accounting:documents products are REST and read client companies and upload invoice or receipt documents to DATEV Unternehmen Online. Finalized bookings go in as asynchronous file jobs: EXTF CSV files into DATEV Rechnungswesen and DXSO XML jobs into DUO, each submitted then polled for status. Authentication is OAuth 2.0 Authorization Code with PKCE against login.datev.de, requiring an interactive user sign-in, a state value of at least 20 characters, and a 15-minute access token that must be refreshed. DATEV has no webhooks, so everything is read by polling.

How Databricks works

Databricks exposes compute, data governance, and ML assets through its REST API on a workspace-specific URL, using OAuth 2.0 client credentials from a service principal with a one-hour access token. Tables, schemas, and catalogs are managed through Unity Catalog, but actual row data is written and read through Databricks Jobs, SQL INSERT, or the SQL Statement Execution API rather than through table metadata calls. The Jobs API supports an idempotency_token so a retried run does not duplicate a load. Databricks has no native invoice, vendor, purchase order, or general ledger object, so it serves here as a data destination, not a finance system of record.

What moves between them

Data moves in one direction, from DATEV into Databricks. ml-connector reads the DATEV client list, the invoice and receipt documents uploaded to DATEV Unternehmen Online, and the EXTF and DXSO booking batches submitted for each client, then loads those records into Delta Lake tables in a target catalog and schema. The cadence is tied to your booking and document workflow rather than a real-time stream, because DATEV processes file jobs asynchronously and publishes no events. Nothing flows back from Databricks into DATEV; finalized bookings in DATEV are write-only and cannot be read back through the API, and Databricks holds no finance records to return.

How ml-connector handles it

ml-connector stores both credential sets encrypted. On the DATEV side it runs the OAuth2 PKCE flow with code_challenge_method S256, a state of at least 20 characters, and a nonce, then refreshes the 15-minute token by sending the client_id only, never the client_secret. On the Databricks side it requests a client-credentials token from the workspace token endpoint with scope all-apis and renews it before the one-hour expiry. Because DATEV has no webhooks, the connector polls: it lists clients, pulls document metadata, and tracks submitted EXTF and DXSO job IDs, applying exponential backoff with jitter starting near 5 seconds since DATEV is explicitly not built for high-frequency polling. Loads into Databricks run as Jobs carrying an idempotency_token so a network retry reuses the existing run rather than double-loading, and table rows are written through the SQL Statement Execution API against a running SQL warehouse. Known edge cases are handled: the DATEV chart of accounts is not readable through the API, so GL account labels must be supplied or configured up front; EXTF files require precomposed UTF-8, so the staged data preserves NFC normalization; and Unity Catalog must be enabled on the target workspace or the table writes have no metastore to land in. Failed loads are retried and replayed from the audit trail.

A real-world example

A mid-sized German manufacturing group with around 300 employees runs its bookkeeping through a DATEV tax advisor and wants a single analytics view of spend across three legal entities. Today its controllers wait for the advisor to close each period in DATEV, then export spreadsheets by hand to build cost reports, which arrive late and never tie cleanly across entities. With DATEV and Databricks connected, ml-connector loads the client metadata, uploaded supplier invoices, and submitted booking batches into Delta Lake on each cycle, so the controllers query current spend in Databricks SQL instead of waiting on manual exports. The finance data lands in one governed warehouse, ready for cross-entity reporting without re-keying.

What you can do

Load DATEV client metadata, uploaded invoice documents, and submitted EXTF and DXSO booking batches into Databricks Delta Lake tables.
Bridge the DATEV OAuth2 PKCE login against login.datev.de and the Databricks service principal client-credentials token.
Poll DATEV on a schedule with backoff and jitter, since DATEV publishes no webhooks and is not built for high-frequency polling.
Write loads as Databricks Jobs with an idempotency_token so a retried run does not duplicate finance data in the warehouse.
Stage data with precomposed UTF-8 and into a Unity Catalog catalog and schema, respecting DATEV file rules and the Databricks metastore.

Questions

Which direction does data move between DATEV and Databricks?: Data moves one way, from DATEV into Databricks. ml-connector reads DATEV client metadata, uploaded documents, and submitted booking batches, then loads them into Delta Lake tables. Nothing is written back to DATEV, because finalized bookings there are write-only and Databricks holds no finance records to return.
Can ml-connector read posted journal entries and the full chart of accounts from DATEV?: No. DATEV bookings submitted as EXTF or DXSO files are write-only and cannot be read back through the API, and the standard DATEV chart of accounts is not exposed programmatically. The connector loads the readable surface, which is client metadata, uploaded invoice and receipt documents, and the booking batch data you submit, while GL account labels are supplied or configured in advance.
How is data actually written into Databricks tables?: Databricks REST calls manage table metadata, not row data, so ml-connector writes records through Databricks Jobs and the SQL Statement Execution API against a running SQL warehouse. Loads carry an idempotency_token so a retried run reuses the existing run instead of duplicating rows. Unity Catalog must be enabled on the target workspace for the catalog and schema to receive the data.

Connect DATEV and Databricks

Free to use. Add your credentials, ping your real systems, and see if we fit.

Get started