Booking 2 new partners for Q3 — strategy calls open this week.
← All insights

Integrations

Five ways your ERP integration silently rots

We've inherited integrations from six other agencies. They all rotted in the same five ways. Here's what to look for in your stack — before it costs you a Q4 weekend.

January 19, 2026·7 min·By The founder

Most ERP integrations on Shopify aren't broken. They're rotting. Quietly, in production, in ways that don't trigger pages but cost you slow drift you'll only notice during a reconciliation, an audit, or a failure window.

We've inherited integrations from six other agencies in the last four years. They all rotted in roughly the same five ways. If you're running a Plus store with a NetSuite, Brightpearl, or Microsoft Dynamics integration, here's what to check before it costs you a weekend.

1. Webhooks without idempotency

Almost every integration we inherit has webhook handlers that assume each event arrives exactly once. Shopify's webhooks don't guarantee that. They can deliver the same event 2–10 times under network retry conditions, and if your handler doesn't dedupe by event ID, you double-create orders, double-decrement inventory, double-bill the customer.

What to check: Does every webhook handler check a deduplication key (idempotency key, event ID, or hash) before processing? Is the dedupe store durable across deploys?

The fix: A 30-line Redis-backed dedupe layer at the top of every handler. We retro-fit it to inherited integrations as the first item in the audit.

2. Retry queues without dead-letter monitoring

The default pattern for "the upstream API failed" is to retry. Most integrations do this. But almost none of them surface the queue depth, the retry count, or the dead-letter queue depth to anyone who's watching.

We've inherited integrations where 4,200 events had been sitting in a dead-letter queue for 11 weeks. Nobody knew. A reconciliation revealed it.

What to check: Where is your retry queue depth visible? Where is dead-letter depth visible? Who is paged when either crosses a threshold?

The fix: Every queue we deploy has depth metrics in Grafana, dead-letter alerts to PagerDuty above thresholds, and a daily Slack post of yesterday's queue health.

3. Schema drift without versioning

Your ERP changed its API last quarter. The integration didn't. It still mostly works — except for the new tax field on shipping line items, which is now silently dropped. Or the new "fulfillment status" enum value, which fails the integration's switch statement and silently routes to the catch-all default.

This is the most expensive class of rot because it doesn't fail loudly. It produces wrong data quietly.

What to check: Is there a versioned schema for the data flowing in and out? Is there a contract test that runs against the upstream's actual API? When the upstream changes, what alerts you?

The fix: A typed schema (Zod, Pydantic, or codegen from an OpenAPI spec) on every boundary. A contract test that runs daily against the upstream's actual responses, fails loudly when fields appear or disappear.

4. Time-bomb credentials

OAuth tokens, API keys, webhook secrets — they all expire eventually, and they don't always tell you when. We've inherited integrations where the token had been silently failing to refresh for weeks because the refresh mechanism was wired to a deprecated OAuth scope, and the only error was a 401 in a log nobody was watching.

What to check: Where are your credentials? When do they rotate? What's the alert path if they fail to refresh?

The fix: Credentials in a vault (1Password, AWS Secrets Manager, Doppler) with a rotation schedule. A health check on every external integration that authenticates and reports green/red. Failed health checks page on-call.

5. Missing reconciliation jobs

The most insidious rot is the one where each individual transaction processed correctly but the aggregate is wrong. Maybe one in 50,000 orders dropped a line item due to a race condition. The order looked fine. The customer was charged correctly. But the line item never made it to the ERP, and inventory shows correct in Shopify but wrong in NetSuite.

Without a reconciliation job, you only find out at month-end close. Or at audit. Or never.

What to check: Is there a reconciliation job that compares order counts, revenue totals, and inventory positions across systems? How often does it run? Where are its results visible?

The fix: A nightly reconciliation that compares Shopify ↔ ERP ↔ 3PL totals across the previous 7 days, alerts on drift greater than 0.1%, and writes results to a dashboard that ops reviews weekly.

What an honest integration audit looks like

When we inherit an integration, we run through a 30-item checklist before we'll touch it. The 5 above are the ones that have shown up in nearly every audit. Others on the list:

  • Webhook signature verification on every endpoint
  • Rate limit handling that backs off rather than failing
  • Bulk operations isolated from sync paths
  • Replay capability for any event
  • Versioned migrations for schema changes
  • Documented failure modes

If your integration was built by a marketplace freelancer at $80/hr two years ago, none of these will be in place. That's not a knock on freelancers — they were paid to ship the happy path. The infrastructure work doesn't fit a marketplace ticket.

What to do about it

If you're staring at an inherited integration:

  1. Audit, don't refactor. A two-week audit will tell you what's load-bearing and what's rotting. Then decide refactor vs rebuild.
  2. Stand up monitoring before you change anything. You can't improve what you can't measure.
  3. Don't fix five things at once. Fix the highest-impact one, watch the metrics, then fix the next.
  4. Document. Every integration we ship has a runbook for failure modes. Every integration we inherit gets one written before we close out.

Integrations are infrastructure. They deserve infrastructure-grade attention. Most don't get it because they're handed to whoever's available, then handed off, then handed off again. The result is the rot we keep seeing.

Take the call

Stop renting Shopify help.
Hire a partner.

30-minute strategy call. Founder on the line. We'll dig into your stack, your goals, and whether we're the right team — no high-pressure sales pitch.