ml-connector
PlexDatabricks

Plex and Databricks integration

Plex runs manufacturing operations and finance. Databricks provides analytics and data intelligence. Connecting the two gets your manufacturing data, supply chain records, and financial transactions flowing into Databricks for reporting, dashboards, and downstream ML workflows without custom scripts or manual exports. ml-connector handles the different APIs and keeps the records synchronized on a schedule you control.

How Plex works

Plex exposes suppliers, purchase orders, purchase order releases, invoices, customers, sales orders, parts, inventory, containers, GL accounts, and payments through REST JSON APIs (Plex Connect) and legacy SOAP datasources. Authentication uses OAuth2 client credentials with a Bearer token against https://cloud.plex.com, or basic auth with username, password, and PCN (company code) for SOAP. Plex cloud offers no native webhooks, so records are read by polling on a configurable interval, typically 5-15 minutes, filtered by modified_date or created_date. Rate limits are not publicly documented, so exponential backoff is required on HTTP 429 responses. Some integration users may have access to scheduled SFTP DataSources extracts for bulk historical loads.

How Databricks works

Databricks accepts data via REST APIs on workspace-specific URLs (https://<workspace-id>.cloud.databricks.com or equivalent). Authentication uses OAuth2 client credentials with a Service Principal that holds client_id and client_secret, and tokens expire in 3600 seconds requiring refresh logic. Data is written to Delta Lake tables by passing SQL statements or metadata via /api/2.0/ or /api/2.1/ endpoints. Databricks is a data analytics platform without native finance or ERP objects; it serves as the data destination, not an operational system. MLflow webhooks exist but only for model registry events, not for table writes or cluster state.

What moves between them

The main flow runs from Plex into Databricks. ml-connector polls Plex on your configured schedule (typically after daily processing or every 5-15 minutes) and retrieves suppliers, purchase orders, invoices, customers, sales orders, parts, inventory, GL accounts, and payments. These records are normalized and written to Databricks tables under a designated Catalog and Schema. Reference data such as parts, suppliers, and GL accounts are written first as dimension tables, then transaction records (invoices, orders) reference them. Plex does not provide webhooks and Databricks does not push events back to Plex, so the flow is unidirectional: Plex to Databricks, on a polling schedule.

How ml-connector handles it

ml-connector stores the Plex OAuth2 credentials encrypted and refreshes the Bearer token before it expires. For each polling interval, it queries Plex using the modified_date and created_date filters to fetch only changed records, reducing load on both systems. The company code (PCN) is included in the Plex request to ensure the integration user retrieves data from the correct Plex company partition. ml-connector normalizes the Plex records and constructs Databricks SQL INSERT or MERGE statements to write them to Delta Lake tables. Because Plex rate limits are undocumented, ml-connector implements exponential backoff and tracks request counts to avoid throttling. On the Databricks side, it refreshes the Service Principal OAuth2 token every 50 minutes (well before the 3600-second expiry) to prevent mid-pipeline authentication failures. Each record write includes the Plex record ID, modified timestamp, and source system identifier so Databricks tables maintain a full audit trail and support change-data-capture (CDC) workflows. Failed writes are retried; if a downstream Databricks job fails, the affected Plex records can be identified and replayed.

A real-world example

A mid-sized discrete manufacturer runs Plex ERP for production, purchasing, and inventory across three plants. Before the integration, analysts manually exported purchase orders, invoices, and production schedules from Plex every week, loaded them into spreadsheets, and built reports for procurement and supply chain visibility. With Plex and Databricks connected, each day's procurement and manufacturing data lands automatically in Databricks tables, feeding dashboards that show spend by supplier, order aging, production rates, and inventory turns in near-real time. The export and re-load step is eliminated, and the data is already normalized for cross-system analytics.

What you can do

  • Poll Plex on a configurable schedule (5-15 minutes) and sync suppliers, purchase orders, invoices, customers, sales orders, parts, inventory, GL accounts, and payments to Databricks.
  • Authenticate Plex with OAuth2 Bearer tokens or SOAP basic auth, routing requests to the correct company code (PCN), and handle rate limit retries.
  • Normalize Plex records and write them to Databricks Delta Lake tables with full audit trail (source ID, modified timestamp, write timestamp).
  • Manage Databricks Service Principal OAuth2 token refresh (3600-second expiry) and handle token refresh mid-pipeline.
  • Support change-data-capture (CDC) workflows and record replay by maintaining Plex record IDs, timestamps, and failure tracking in Databricks.

Questions

Which direction does data move between Plex and Databricks?
Data flows from Plex into Databricks only. Plex operational records (suppliers, purchase orders, invoices, GL accounts, inventory) are polled and written to Databricks tables for analytics and reporting. Databricks is a read-mostly platform for this integration and does not push data back into Plex.
How does ml-connector handle Plex's lack of webhooks and undocumented rate limits?
ml-connector polls Plex on a schedule you configure (typically 5-15 minute intervals) using modified_date and created_date filters to fetch only changed records. It implements exponential backoff when Plex returns HTTP 429, and tracks request counts to avoid throttling. This approach reduces load on Plex while keeping Databricks tables fresh.
How are Plex company codes (PCN) and Databricks workspaces mapped?
Each Plex company code is configured separately in ml-connector with its own OAuth2 credentials and PCN value. When polling, ml-connector includes the PCN in the request to route to the correct Plex company partition. Records are written to Databricks with the company code and source system identifier included, so multi-company setups can segregate data by table or schema as needed.

Related integrations

Connect Plex and Databricks

Free to use. Add your credentials, ping your real systems, and see if we fit.

Get started