ml-connector
Sage 50Databricks

Sage 50 and Databricks integration

Sage 50 holds your accounting records. Databricks holds your data warehouse. Connecting the two lets you move your invoices, vendors, customers, general ledger accounts, and employees into Databricks for analytics, reporting, and business intelligence without manual export. ml-connector manages the local Sage 50 SDK connection, polls on a schedule you set, and writes clean accounting data directly to Databricks tables.

How Sage 50 works

Sage 50 is a desktop-installed accounting application available in two editions: Sage 50 US (formerly Peachtree) and Sage 50 UK (formerly Sage Line 50). It exposes vendors, purchase invoices, sales invoices, general ledger accounts, employees, customers, payments, receipts, bank accounts, inventory items, and projects through a local Windows SDK (US edition: .NET SDK or legacy COM/ODBC; UK edition: Sage Data Objects COM/ActiveX). There is no cloud REST API or remote integration surface. Authentication is Windows-local: US edition requires an ApplicationID, CompanyPath, and Windows credentials; UK edition requires DataPath and Windows credentials. Sage 50 offers no webhooks or event stream, so accounting records are read by polling with a filter on LastModifiedDate or TransactionDate, with a recommended minimum poll interval of 5 to 15 minutes.

How Databricks works

Databricks is a cloud data intelligence platform built on Apache Spark and Delta Lake. It exposes clusters, SQL warehouses, catalogs, schemas, tables, experiments, runs, and registered models through REST APIs at workspace-specific URLs (aws.databricks.com, azuredatabricks.net, or gcp.databricks.com depending on region). Authentication is OAuth 2.0 Client Credentials via a Service Principal with client_id and client_secret. Tokens expire in 3600 seconds and require refresh. Databricks has no native finance or ERP objects; it is a data platform that accepts any structured data written to its tables via REST metadata operations or SQL/Spark. Webhooks are supported only for MLflow Model Registry events, not for compute or data events, so ml-connector polls Databricks state as needed.

What moves between them

Accounting records flow from Sage 50 into Databricks. Purchase invoices, sales invoices, vendors, customers, general ledger accounts, employees, payments, and receipts are polled from Sage 50 on a schedule (typically every 15 minutes to hourly depending on your accounting cycle), and each record is written to a corresponding Databricks table in your warehouse schema. The data model maps Sage 50 field names and types to Databricks columns, preserving audit fields like transaction date, account code, and vendor ID. No data flows back into Sage 50; Databricks is a read-sink for analytics and BI.

How ml-connector handles it

ml-connector runs on a Windows machine or container with Sage 50 installed and the local SDK available. It authenticates to Sage 50 using Windows credentials and the required SDK parameters (ApplicationID and CompanyPath for US edition, or DataPath for UK edition), then polls for records modified since the last sync using LastModifiedDate or TransactionDate filters. Because Sage 50 SDK requires exclusive access and cannot run as a service without workarounds, ml-connector coordinates polling intervals to avoid conflicts with interactive Sage 50 sessions. For Databricks, ml-connector exchanges OAuth credentials (client_id and client_secret) at the token endpoint for a bearer token, refreshing every 50 minutes to stay ahead of the 60-minute expiry. Each Sage 50 table is mapped to a Databricks Unity Catalog schema and table, and column types are normalized to Databricks data types (string, long, timestamp, decimal for GL amounts). If a write to Databricks fails or a token refresh is rejected, ml-connector retries with exponential backoff and logs the full record in an audit trail for manual replay.

A real-world example

A small accounting firm manages five clients on Sage 50 US, handling invoicing, expense tracking, and monthly GL closes. The firm wants to build a consolidated financial dashboard in Databricks to compare client performance across all five entities, but exporting GL account balances, invoice aged receivables, and vendor spend from each Sage 50 company file each month is manual and error-prone. With Sage 50 and Databricks connected, the firm's invoices, vendors, GL accounts, and employees sync automatically every hour into a unified Databricks warehouse, where a BI tool can aggregate metrics across all five clients and flag reconciliation issues before month-end.

What you can do

  • Poll Sage 50 invoices, vendors, customers, and general ledger accounts and write them to Databricks tables on a schedule.
  • Authenticate Sage 50 via the Windows SDK with ApplicationID and CompanyPath (US) or DataPath (UK), and Databricks via OAuth service principal credentials.
  • Map Sage 50 field types to Databricks columns, normalize timestamps and decimal amounts, and preserve audit fields for reconciliation.
  • Handle exclusive-access constraints on Sage 50's SDK layer by coordinating polling windows to avoid conflicts with interactive sessions.
  • Retry failed writes to Databricks with exponential backoff, refresh OAuth tokens before expiry, and maintain a full audit trail for every record moved.

Questions

What accounting records move from Sage 50 to Databricks?
Purchase invoices, sales invoices, vendors, customers, general ledger accounts, employees, payments, receipts, bank accounts, and inventory items all poll from Sage 50 and write to Databricks tables. Employee data includes employer-contribution fields which may vary by edition. Sage 50 does not expose accounting dimensions as distinct objects, so dimensions are embedded as project or department codes on transaction line items.
Why does ml-connector need to run on a Windows machine with Sage 50 installed?
Sage 50 has no cloud REST API and exposes data only through a local Windows SDK (US edition: .NET SDK or COM/ODBC) or Sage Data Objects (UK edition: COM/ActiveX). The SDK requires direct access to Sage 50 company data files on the same machine or LAN, and integration requires Windows credentials against the Sage 50 user database. No remote or headless integration is possible without a local SDK session.
How does ml-connector handle Sage 50's exclusive-access requirement and Databricks' token expiry?
Sage 50's SDK cannot run while a user is logged in interactively, so ml-connector coordinates polling windows to avoid conflicts with live accounting sessions. For Databricks, ml-connector refreshes OAuth bearer tokens before the 3600-second expiry, typically every 50 minutes, and retries failed writes with exponential backoff if a token refresh is rejected. All failures are logged in an audit trail for replay.

Related integrations

Connect Sage 50 and Databricks

Free to use. Add your credentials, ping your real systems, and see if we fit.

Get started