ml-connector
XeroGoogle BigQuery

Xero and Google BigQuery integration

Xero runs your accounting. Google BigQuery is the data warehouse where you analyze it. Connecting the two copies Xero invoices, bills, contacts, payments, and journals into BigQuery tables so finance data is queryable in SQL alongside the rest of your business. ml-connector handles the very different APIs on each side, reacting to Xero webhooks where they exist and polling where they do not, and writes the rows into the warehouse on the cadence you set.

How Xero works

Xero exposes contacts, invoices and bills, purchase orders, payments, accounts, manual journals, and tracking categories through the REST Accounting API over HTTPS at api.xero.com. Authentication is OAuth 2.0 authorization code, with a 30-minute access token and a 60-day refresh token, and every call carries an Authorization bearer token plus a Xero-tenant-id header to target one organization. Xero pushes webhooks for Contact, Invoice, CreditNote, and Payment changes, but the payloads carry only the record ID and event type, so a follow-up GET is required to retrieve the full record. List endpoints use page-based pagination at 100 records per page, and the If-Modified-Since header returns only records changed since a given UTC time.

How Google BigQuery works

Google BigQuery is a serverless cloud data warehouse, not an ERP, so it has no native invoice, contact, or account objects. The customer defines tables such as invoices, bills, contacts, and gl_entries inside a dataset within a GCP project, and the connector writes to them. Rows go in through the Streaming Insert API at tabledata.insertAll, with an optional insertId for best-effort deduplication, or through batch load jobs, and queries run as asynchronous jobs that you poll until done. Authentication is a Google service account: the connector signs a JWT with the service account private key and exchanges it for a one-hour access token. BigQuery sends no webhooks, so reads are done by querying on a schedule.

What moves between them

The flow runs from Xero into Google BigQuery. ml-connector reads Xero invoices, bills, contacts, payments, accounts, and journals and writes each record as a row into the matching BigQuery table inside your dataset. Xero invoice, contact, credit note, and payment webhooks trigger a follow-up fetch of the changed record and a near-real-time write into BigQuery, while a scheduled poll using If-Modified-Since covers accounts, items, tracking categories, and anything else Xero does not push. BigQuery is a read-only accounting destination here, so ml-connector loads data into the warehouse for analysis and never writes financial records back into Xero.

How ml-connector handles it

ml-connector stores both credential sets encrypted. On the Xero side it runs the OAuth authorization code flow, refreshes the 30-minute access token before it expires, and sends the Xero-tenant-id header so calls hit the right organization. On the BigQuery side it signs a JWT with the service account private key, preserving the literal newlines in the PEM block so signing does not break, and refreshes the one-hour token before expiry. Xero webhook signatures are verified as Base64 HMAC-SHA256 over the raw request body, and because the payload is metadata only, the connector follows each event with a GET to fetch the full record, then writes it into BigQuery with a stable insertId so a retry does not duplicate the row. Entities Xero does not push, such as accounts and items, are polled with If-Modified-Since and page-based paging at 100 records per page. Xero rate limits of 60 calls per minute and 5,000 per day per tenant return HTTP 429 with a Retry-After header, so the connector backs off and respects it; adding a column to a BigQuery table is safe, but renaming one is destructive and handled deliberately. Every record carries a full audit trail and can be replayed if a write fails.

A real-world example

A 90-person professional services firm keeps its books in Xero and wants finance numbers in the same Google BigQuery warehouse that already holds its project and time-tracking data. Before the integration, an analyst exported invoices and bills from Xero to spreadsheets each week and loaded them by hand, so revenue and cost dashboards were always a few days stale and broke whenever a column moved. With Xero and Google BigQuery connected, invoices, bills, contacts, and payments flow into warehouse tables continuously, webhook changes appear within minutes, and the analyst writes SQL against current finance data joined to project data. The manual export step is gone and the dashboards refresh on their own.

What you can do

  • Load Xero invoices, bills, contacts, payments, accounts, and journals into Google BigQuery tables inside your dataset.
  • React to Xero invoice, contact, credit note, and payment webhooks with a follow-up record fetch and a near-real-time warehouse write.
  • Poll the entities Xero does not push, such as accounts and items, using If-Modified-Since and page-based pagination.
  • Authenticate Xero with OAuth 2.0 refresh tokens and the tenant header, and BigQuery with a signed service account JWT.
  • Write rows with a stable insertId so retries do not create duplicates, with a full audit trail on every record.

Questions

Which direction does data move between Xero and Google BigQuery?
The flow is Xero into Google BigQuery. Invoices, bills, contacts, payments, accounts, and journals move from Xero into BigQuery tables you define. BigQuery is treated as a read-only analytics destination, so ml-connector loads finance data into the warehouse and does not write records back into Xero.
Does the integration use Xero webhooks or polling?
It uses both. Xero pushes webhooks for contacts, invoices, credit notes, and payments, and those payloads carry only the record ID, so ml-connector verifies the signature and then fetches the full record before writing it to BigQuery. For entities Xero does not push, such as accounts and items, it polls on a schedule using the If-Modified-Since header to pull only changed records.
How does the connector avoid duplicate rows in BigQuery?
BigQuery streaming inserts accept an insertId that gives best-effort deduplication within about a one-minute window, so ml-connector sends a stable insertId derived from the Xero record ID on every write. Because that window is short, the connector also keys on the Xero GUID and replays through its audit trail rather than relying on the streaming dedup alone for long-horizon safety.

Related integrations

Connect Xero and Google BigQuery

Free to use. Add your credentials, ping your real systems, and see if we fit.

Get started