We used to deploy once a week. Fridays. A big batch of everything that was "ready." The deploy took 2 hours of manual checks, someone stayed late to monitor, and about once a month something broke badly enough to require a rollback. Now we deploy 12 times a day and nothing breaks. Not because we got better at testing. Because each deploy is so small that breaking things is nearly impossible.

The small-batch principle

When you deploy once a week, each deploy contains 20-40 changes. If something breaks, you're debugging across 40 possible causes. When you deploy 12 times a day, each deploy contains 1-2 changes. If something breaks, the cause is obvious and the fix is either "revert the last commit" or "fix the one thing that changed." The diagnosis time goes from hours to seconds.

WEEKLY DEPLOY ... 30 more changes which one broke it? CONTINUOUS DEPLOY (12x/day) → ... 1 change = obvious cause
fig. 1 — blast radius: weekly batch vs continuous small deploys

The pipeline

Our deploy pipeline has exactly 4 stages. No staging environment. No QA team. No deploy approvals. Here's the actual flow:

# The entire deploy pipeline

1. git push
   Developer pushes to main branch
   No PRs for single-developer changes
   PRs only when touching shared infrastructure

2. CI (90 seconds)
   TypeScript type check
   Critical path tests only (not full suite)
   Build verification

3. Auto-deploy (60 seconds)
   Vercel/Cloudflare Pages: automatic on push
   Zero-downtime deployment
   Previous version available for instant rollback

4. Monitor (5 minutes)
   Error rate spike → auto-alert
   Conversion rate drop → auto-alert
   API latency increase → auto-alert

Total time from git push to live in production: under 3 minutes. Total human attention required: checking the alert channel for 5 minutes after deploy. If no alerts, move on.

Why no staging environment

Staging environments create a false sense of security. "It works in staging" is the most dangerous phrase in software because staging is never identical to production. Different data, different traffic patterns, different race conditions. The bugs you catch in staging are the easy bugs. The ones that matter only show up in production.

Instead of staging, we use feature flags. Every significant change ships behind a flag. The code is in production, but the feature is only visible to internal users. We test in production, with real data and real traffic patterns, but only we see it. When it's ready, we flip the flag. If something breaks, we flip it back. No rollback needed.

// Feature flag pattern we use everywhere
const showNewPaywall = featureFlag('paywall-v3', {
  internal: true,     // team sees it
  percentage: 0,     // users don't, yet
});

// When ready: percentage: 10 → 50 → 100
// If broken: percentage: 0 (instant rollback)

What we actually test

We don't aim for high test coverage. Test coverage is a vanity metric that optimizes for the wrong thing — it measures how much code is exercised, not how much risk is mitigated. We test the critical path only: payment flow, authentication, and data integrity. These are the things that, if broken, cause real damage. Everything else is covered by monitoring and fast rollback.

Our test suite runs in 90 seconds. Not because we're lazy — because we aggressively delete tests that don't catch real bugs. Every quarter, we audit our test suite: which tests caught a real bug in the last 90 days? Any test that hasn't caught a bug gets deleted. This keeps the suite fast and relevant.

The purpose of tests isn't to prove your code works. It's to catch the specific failures that would damage your users or your business. Everything else is ceremony.

Monitoring as the safety net

Our monitoring stack is simple and opinionated. We track three things in real-time after every deploy:

Error rate. If the JavaScript error rate increases by more than 2x the baseline within 5 minutes of a deploy, we get an alert. This catches crashes, API failures, and rendering errors. Threshold is deliberately low because each deploy is small — any error spike is almost certainly caused by the last change.

Core action completion rate. If the rate at which users complete the core action drops by more than 10% in the 30 minutes after a deploy, we get an alert. This catches subtle product-breaking bugs that don't throw errors — a button that doesn't respond, a flow that dead-ends, a screen that renders but is unusable.

API p99 latency. If the 99th percentile response time increases by more than 50%, we get an alert. This catches performance regressions — the kind of bug that doesn't break anything but makes the product feel slow enough to increase abandonment.

Three metrics. Three thresholds. Automatic alerts. No dashboards to watch. If none of the three fire within 5 minutes, the deploy is clean and we move on.

The culture shift

Deploying 12 times a day requires trusting your team to not break things, and trusting your systems to catch it when they do. This is a culture shift, not a tooling change. The first month we moved to continuous deploy, people were nervous. "Should I really push this without a code review?" "What if I break production?" By the second month, the nervousness was gone — because people had pushed dozens of changes and the system had caught the 2-3 that had issues before any user noticed.

The paradox of frequent deploys: they feel more dangerous but are actually safer. Each individual deploy has almost zero risk because it's tiny. The aggregate risk over a day is lower than a single weekly batch deploy because problems are caught immediately instead of accumulating.


The deploy pipeline is not a technical problem. It's a feedback loop. The faster you can get changes to production, the faster you learn whether they work. The faster you learn, the faster you improve. Every hour your code sits in a branch waiting for review or staging or deploy approval is an hour of delayed learning. Optimize for cycle time above all else, and build the safety nets that let you move fast without breaking things.