ml-connector
XeroDatabricks

Xero and Databricks integration

Xero runs accounting for growing businesses. Databricks runs analytics and reporting on data at scale. Connecting the two moves your Xero financial records into Databricks tables where your analytics and ML teams can build reports, dashboards, and predictive models without manual exports. New invoices, payments, and journal entries flow automatically, and Databricks always holds the latest state from Xero.

How Xero works

Xero exposes invoices, payments, contacts, accounts, purchase orders, tracking categories, manual journals, credit notes, and bank transactions through the Xero Accounting API, a REST endpoint at https://api.xero.com/api.xro/2.0/ returning JSON or XML. Every request requires OAuth2 bearer tokens (expiring 30 minutes) and a Xero-tenant-id header to specify which organization. Xero offers webhooks for invoices, payments, contacts, purchase orders, and manual journals (CREATE and UPDATE events only), though webhook payloads contain metadata only and require a follow-up GET to fetch the full record. Older records can be retrieved via polling with If-Modified-Since headers. Rate limits allow 5 concurrent calls, 60 per minute per tenant, and 5000 per day per tenant.

How Databricks works

Databricks exposes compute resources, job scheduling, SQL warehouses, and data catalog objects through REST APIs with workspace-specific base URLs and OAuth2 Client Credentials authentication. Tokens expire after 3600 seconds, requiring refresh logic. Databricks webhooks cover only MLflow Model Registry events (CREATE, STAGE_TRANSITION, TAG events), and they start in TEST_MODE and must be activated. Databricks is a data platform without native finance objects; it accepts data via SQL writes and REST metadata APIs. No webhooks exist for cluster state changes, job completion, or table mutations, so polling is required for compute events. Most finance integrations target Databricks as a data destination, writing records into Delta Lake tables for analytics rather than triggering Databricks workflows.

What moves between them

The main flow runs from Xero into Databricks. ml-connector reads Xero invoices, payments, contacts, accounts, and manual journals via webhooks (when enabled) and polling fallback, and writes each record into a corresponding Databricks table in a designated catalog and schema. Tracking categories and items become dimension tables. Deleted records in Xero are not replicated (Xero does not return them by default). Reference data such as contacts and accounts are synced in both directions so invoice and payment line items reference valid dimensions in Databricks. The sync cadence is configurable per record type.

How ml-connector handles it

ml-connector stores the Xero OAuth2 credentials encrypted, requests bearer tokens with a 25-minute refresh window before expiry, and includes the required Xero-tenant-id header on every request. It subscribes to Xero webhooks for invoices, payments, and journals where the customer has enabled them, and falls back to polling on If-Modified-Since for customers without webhook access or for incremental syncs. When a webhook arrives with a resource ID, ml-connector fetches the full record via GET (since Xero webhook events are metadata-only) and inserts or updates the matching Databricks table row using SQL. Page-based pagination is handled by iterating the page parameter from 1 to the page count. Xero enforces rate limits (60 calls per minute per tenant), so ml-connector batches requests and backs off on 429 responses. Reference data such as accounts, tracking categories, and contacts are synced first so every invoice and payment line references a valid Databricks dimension. The flow stores job IDs and record hashes to prevent duplicate writes.

A real-world example

A mid-sized SaaS company runs Xero for accounting across multiple legal entities and uses Databricks for financial analytics, churn prediction, and revenue reporting. Before the integration, the finance team exported daily reports from Xero and loaded them manually into a data warehouse using SQL scripts that took 30 minutes and often failed on schema mismatches. With Xero and Databricks connected, invoices and payments flow automatically into Databricks tables on an hourly schedule, so the analytics team can build real-time dashboards and cohort analyses without re-keying or manual intervention. Month-end close is faster because actuals are already loaded and reconciled in the warehouse.

What you can do

  • Sync Xero invoices, payments, contacts, accounts, and manual journals into Databricks tables on a schedule or via webhooks.
  • Handle Xero OAuth2 bearer tokens expiring every 30 minutes and refresh tokens valid for 60 days.
  • Map Xero's page-based pagination and tenant-id header requirements into Databricks SQL writes without requiring custom script modification.
  • Transform Xero tracking categories and items into dimension tables that invoice and payment fact tables reference.
  • Deduplicate records and replay failed writes using record hashes and job ID tracking.

Questions

Does the integration support Xero webhooks?
Yes. ml-connector subscribes to Xero webhooks for invoices, payments, manual journals, purchase orders, and credit notes where your Xero administrator has enabled them in the Developer portal. When Xero sends a webhook (which contains only the resource ID), ml-connector fetches the full record via GET and writes it to Databricks. If webhooks are disabled, ml-connector falls back to polling on a schedule.
How does the integration handle Xero's 30-minute token expiry?
ml-connector stores Xero OAuth2 credentials encrypted and refreshes the bearer token every 25 minutes, before expiry, so every API call has a valid token. The refresh token is valid for 60 days, so periodic user re-authentication may be required if the integration runs longer than that.
What happens to deleted records in Xero?
Xero does not return deleted records by default in API responses. ml-connector does not replicate deletions to Databricks, so historical records remain in the data warehouse. If you need to track deleted invoices or payments, the Databricks tables can store a soft-delete flag that you update manually or via downstream ETL.

Related integrations

Connect Xero and Databricks

Free to use. Add your credentials, ping your real systems, and see if we fit.

Get started