API Integrations That Don't Break at 2 AM

August 5, 2025

Modern products are mostly glue: payments via RevenueCat or Razorpay, auth via Clerk, push via FCM, AI via OpenAI. Each integration works perfectly in the demo. Production is where they conspire against you. These patterns are the difference.

Wrap every third-party client

Never scatter stripe.charges.create() through your codebase. One module owns each integration:

// payments.ts — the only file that imports the SDK export async function chargeSubscription(userId: string, plan: Plan) { // retries, logging, error normalization live here }

When the provider changes their API (they will), or you switch providers (you might), the blast radius is one file.

Webhooks: verify, dedupe, ack fast

Every webhook handler needs three things before any business logic:

  1. Signature verification — an unverified webhook endpoint is an open door
  2. Idempotency — providers redeliver; store the event ID, skip repeats
  3. Fast acknowledgment — return 200 immediately, process async; slow handlers trigger retries that compound into floods

Most "we got charged twice" and "users got 5 notifications" bugs are a missing #2.

Normalize errors at the boundary

Each provider fails in its own dialect — different status codes, different retry semantics. Translate them once into your own categories: retryable, user_error, provider_down, config_error. Downstream code makes decisions against your vocabulary, not five different ones.

Build the degraded mode on day one

What happens when the embeddings API is down? In a matching product, the right answer might be "serve yesterday's matches." When push fails? Queue and retry. When payments hiccup? Grace period, not instant lockout. Deciding these after the outage means deciding them at 2 AM.

Monitor their uptime, not just yours

Your dashboard says green; users say broken — because the provider is down. Track third-party error rates as first-class metrics and alert on them. "It's not us, it's Razorpay" is only a useful sentence if you can say it in the first five minutes.

Integrations are easy to start and hard to operate. The teams that look calm aren't lucky — they wrapped, verified, deduped, and rehearsed the failure before it happened.