ml-connector
Oracle NetSuiteDatabricks

Oracle NetSuite and Databricks integration

Oracle NetSuite runs your financials and operational records. Databricks ingests and transforms data for analytics. Connecting the two moves your AP and GL records from NetSuite into Databricks on a schedule you define, so your finance team can analyze spending, reconcile invoices across systems, and audit transaction flows without manual exports and re-keying. NetSuite and Databricks use different APIs and authentication flows that ml-connector bridges.

How Oracle NetSuite works

Oracle NetSuite exposes vendors, purchase orders, vendor bills, invoices, GL accounts, inventory items, customers, employees, departments, and locations through SuiteTalk REST Web Services over HTTPS. Authentication uses OAuth 2.0 Client Credentials with a certificate (recommended) or Token-Based Authentication with static tokens (legacy, deprecated 2026.1). NetSuite supports Event Subscriptions as push webhooks for certain record types like Sales Orders and Invoices, though webhooks lack native HMAC signatures and must use IP allowlists instead. For reliable bulk reads and historical data, SuiteQL queries poll NetSuite records on demand. OAuth tokens expire after 60 minutes with no refresh token in the M2M flow.

How Databricks works

Databricks is a data platform with workspace-scoped REST APIs over HTTPS serving compute clusters, SQL warehouses, catalogs, schemas, tables, and ML models. It authenticates via OAuth 2.0 Client Credentials at the workspace or account level, returning bearer tokens valid for 3600 seconds. Databricks webhooks support only MLflow Model Registry events (legacy workspace feature), so finance data writes require either SQL queries or direct REST table metadata calls. Data is loaded via SQL or Spark rather than REST payloads. Token refresh must be handled by the client after 3600 seconds, and all-apis grants broad access to all endpoints without finer-grained scopes.

What moves between them

Invoice records, purchase orders, and GL account transactions flow from Oracle NetSuite into Databricks. Vendor bills and invoices are read via SuiteQL queries on a schedule (typically daily or weekly) and loaded into Databricks tables under a finance catalog. GL transactions and account balances are appended to a transactions table for audit and analysis. Account and vendor master records are synced weekly to keep dimension tables current. All data is append-only in Databricks; historical records are preserved for audit trails.

How ml-connector handles it

ml-connector uses OAuth 2.0 Client Credentials to authenticate to NetSuite and retrieves records via SuiteQL queries rather than relying on Event Subscriptions, since NetSuite webhooks lack HMAC signatures and require IP allowlists. The NetSuite OAuth certificate is stored encrypted and presented on each request. ml-connector refreshes NetSuite tokens before the 60-minute expiry and handles Databricks token refresh every 3600 seconds. On the Databricks side, ml-connector creates and manages tables within a finance catalog, mapping NetSuite GL accounts and vendors to Databricks dimension tables, and appends transaction records to fact tables. Because Databricks has no native webhooks for data writes, all loads use SQL inserts or batch upserts. ml-connector tracks the last-synced timestamp for each query so repeat runs skip already-ingested records, and every record carries a full audit trail including the source query timestamp, NetSuite internal ID, and sync run date.

A real-world example

A mid-sized professional services firm runs Oracle NetSuite for accounting and procurement across multiple offices, and uses Databricks to build a data lake for finance analytics, project costing, and vendor spend analysis. Before the integration, finance analysts exported invoice and GL data from NetSuite monthly in CSV format, transformed it manually in Excel, then loaded it into Databricks for analysis, a process that took two to three days and introduced transcription errors. With NetSuite and Databricks connected, invoice and GL records flow automatically on a weekly schedule into Databricks tables, historical data is preserved for trend analysis, and the finance team can run spend reports and cost allocation analyses the same week payables close. The monthly export-and-transform cycle is eliminated, and reconciliation between NetSuite and Databricks is deterministic.

What you can do

  • Sync vendor bills, invoices, and purchase orders from Oracle NetSuite into Databricks tables on a scheduled interval for spend analysis and audit.
  • Load GL account transactions and balances into Databricks fact and dimension tables for general ledger analytics and reconciliation.
  • Authenticate Oracle NetSuite with OAuth 2.0 Client Credentials and a certificate, and Databricks with OAuth 2.0 at the workspace or account level.
  • Poll SuiteQL queries to retrieve historical and incremental invoice and transaction records, with timestamp tracking to avoid duplicate loads.
  • Preserve full audit trails on every record, including NetSuite internal IDs, sync timestamps, and source query details for compliance and root-cause analysis.

Questions

Why does ml-connector use SuiteQL polling instead of NetSuite Event Subscriptions?
NetSuite Event Subscriptions lack native HMAC signatures and require IP allowlists for security, making them less reliable for a multi-tenant platform like ml-connector. Polling via SuiteQL gives ml-connector full control over retry logic, timestamp tracking, and audit trails, and it handles bulk reads and historical backfills without relying on webhook push delivery.
How does ml-connector handle token expiry on both sides?
NetSuite OAuth tokens expire after 60 minutes with no refresh token, so ml-connector proactively refreshes tokens before expiry. Databricks bearer tokens expire every 3600 seconds, and ml-connector handles token refresh automatically on each request cycle. Both refresh operations use the stored client credentials and are transparent to the sync schedule.
What happens to historical data when a new invoice is loaded into Databricks?
All data loads are append-only into Databricks tables. ml-connector tracks the last-synced timestamp for each SuiteQL query, so repeat runs load only new and updated records, avoiding duplicates. Historical records remain in the table, giving the finance team a complete audit trail and the ability to run trend reports across multiple sync periods.

Related integrations

Connect Oracle NetSuite and Databricks

Free to use. Add your credentials, ping your real systems, and see if we fit.

Get started