ml-connector
DATEVSFTP / Flat Files

DATEV and SFTP / Flat Files integration

DATEV runs accounting and tax for German businesses, split between the on-premise Rechnungswesen engine and the DATEV Unternehmen Online cloud. SFTP / Flat Files is the realistic path for legacy ERP and EDI partners that export data as CSV, fixed-width, or X12 files instead of offering an API. Connecting the two means invoice, vendor, and journal files that arrive in an SFTP folder are converted into DATEV bookings and document uploads without anyone re-keying them. ml-connector handles the very different mechanics on each side: polling files over SSH on one end, and submitting asynchronous EXTF jobs and DUO uploads over OAuth on the other. Because DATEV booking imports are one-way and write-only, the file server stays the system of record for the source data.

How DATEV works

DATEV is not a conventional REST API. The accounting:clients product reads the list of client companies, accounting:documents uploads invoices and receipts to DATEV Unternehmen Online, and finalized bookings are submitted as EXTF CSV files through the accounting:extf-files product as asynchronous jobs against the on-premise Rechnungswesen engine. Booking suggestions can also be sent as DXSO XML jobs to DUO. Every cloud call uses OAuth 2.0 Authorization Code with PKCE through Login mit DATEV, an interactive user login with no machine-to-machine flow, and access tokens last only 900 seconds. There are no webhooks, so job status is polled, and posted journal entries and the full chart of accounts cannot be read back.

How SFTP / Flat Files works

SFTP / Flat Files has no API. It is an SSH file-transfer transport on port 22 where structured text files, CSV, fixed-width, X12 EDI, or EDIFACT, are exchanged through folders such as inbound, outbound, processed, and acks. Authentication is an SSH key pair, which is preferred for automation, or a username and password, with the server host key verified to prevent interception. The connector actively polls the inbound directory on a schedule, typically every 5 to 60 minutes, because there is no push mechanism in the protocol. Read means parse a downloaded file; write means produce an outbound file and deposit it in the partner folder, and both directions are supported across invoices, vendors, journal entries, customers, items, and cost centers.

What moves between them

The main flow runs from SFTP / Flat Files into DATEV. ml-connector polls the inbound folder, picks up new invoice, vendor master, and journal entry files, parses CSV, fixed-width, or X12 layouts, and maps them to canonical records. Invoice PDFs and e-invoice files are uploaded into DATEV Unternehmen Online as documents under the correct client-specific document type, while journal entries and vendor master data are written into DATEV Rechnungswesen as EXTF CSV booking batches against the right debit and credit accounts. Because EXTF bookings are write-only and posted journals cannot be read back, DATEV does not export entries to the file server. Where the file server holds outbound folders, reference data can also be written back out as CSV.

How ml-connector handles it

ml-connector stores both credential sets encrypted. On the file side it connects over SSH using the private key or password, verifies the server host key, and lists the inbound folder on each poll, filtering out filenames it has already processed using its own database rather than relying on file timestamps, which are unreliable across timezones. It detects partial uploads by waiting for a sentinel file or a stable file size before processing. On the DATEV side it runs the OAuth 2.0 PKCE login once, then refreshes the 900-second access token automatically by sending only the client id, and fetches the client-specific document types before any upload because they are never safe to hardcode. Journal and vendor files are assembled into EXTF CSV with NFC-normalized UTF-8, since non-precomposed characters are silently rejected, and DATEV dedupes by filename plus document type, so ml-connector generates deterministic filenames and treats the duplicate error code as a safe no-op. Each EXTF and DXSO submission is an asynchronous job, so the connector polls the job status with exponential backoff until it reports complete or failed. Every file carries a BullMQ jobId built from its name and a content hash for queue-level dedup, and every record has a full audit trail and can be replayed if a DATEV submission fails.

A real-world example

A German manufacturing firm with roughly 200 employees keeps its books in DATEV through its tax advisor, but its older on-premise ERP and a key EDI trading partner can only export data as nightly CSV and X12 files to an SFTP server. Before the integration, a bookkeeper downloaded each batch, reformatted the supplier invoices and journal lines by hand, and keyed them into DATEV, which delayed the books by days and introduced posting errors that surfaced at month-end. With DATEV and SFTP / Flat Files connected, each file that lands in the inbound folder is parsed, mapped, and submitted as a DATEV EXTF booking batch or a DUO document upload within the polling window, with the document PDFs filed under the correct type. The manual reformatting is gone, and the tax advisor sees clean, deduplicated batches.

What you can do

  • Poll the SFTP inbound folder and turn dropped CSV, fixed-width, and X12 files into DATEV EXTF booking batches.
  • Upload invoice and receipt files into DATEV Unternehmen Online under the correct client-specific document type.
  • Bridge SSH key or password login on the file server to the OAuth 2.0 PKCE login that DATEV requires.
  • Dedupe by filename and content hash so a re-read file never posts to DATEV twice.
  • Submit each EXTF or DXSO job and poll its status with backoff until it reports complete, with a full audit trail.

Questions

Which direction does data move between DATEV and SFTP / Flat Files?
The main flow is from SFTP / Flat Files into DATEV. Invoice, vendor, and journal files dropped in the inbound folder are parsed and submitted into DATEV as EXTF booking batches and DUO document uploads. DATEV booking imports are one-way and write-only, so posted journal entries cannot be read back out to the file server, though reference data can be written to outbound folders when needed.
How does the integration handle DATEV's asynchronous jobs and lack of webhooks?
DATEV processes EXTF and DXSO submissions as asynchronous jobs and has no webhooks, so ml-connector submits each file as a job, receives a job id, and polls the status endpoint with exponential backoff until it reports complete or failed. The SFTP side is also pull-only, so the inbound folder is polled on a schedule, typically every 5 to 60 minutes. Nothing is treated as posted until DATEV confirms the job finished.
What stops the same file from being booked into DATEV twice?
Two layers of deduplication. The connector records every processed filename in its own database and moves handled files to a processed folder, so a re-listed file is skipped on the next poll. On submission it generates deterministic EXTF filenames, and DATEV itself rejects duplicates by filename plus document type, which ml-connector treats as a safe no-op rather than an error.

Related integrations

Connect DATEV and SFTP / Flat Files

Free to use. Add your credentials, ping your real systems, and see if we fit.

Get started