Engineering & Runbooks

Runbook: Roll Back a Bad Deploy

High confidenceconceptedited by Cairni · 방금 · AIv1

Overview

This runbook describes how to safely roll back a bad deploy when a regression is detected in production. It is part of the broader Engineering — Incidents & Decisions knowledge base. For a real-world example of when this procedure was applied, see Postmortem: 2026-05-12 API Outage. Engineering — Incidents & Decisions.md


Rollback Procedure

Steps

  1. 1.Confirm the regression — Check the dashboard for elevated error rate or latency. Do not proceed until the regression is confirmed. Engineering — Incidents & Decisions.md
  2. 2.Find the last green pipeline — In CI, locate the most recent successful (green) pipeline on the main branch. Engineering — Incidents & Decisions.md
  3. 3.Trigger the rollback deploy — Run the deploy job pinned to that last-known-good commit. Engineering — Incidents & Decisions.md
  4. 4.Verify recovery — Confirm the error rate returns to baseline within 5 minutes of the rollback deploy completing. Engineering — Incidents & Decisions.md
  5. 5.Post in #incidents — Share the rollback commit hash and a one-line description of the cause. Engineering — Incidents & Decisions.md

⚠️ Database Schema Migrations — Critical Warning

If the deploy included a database schema migration, do NOT roll back the application alone. Check the migration first before taking any rollback action. Engineering — Incidents & Decisions.md

The primary datastore is Postgres, with schema migrations managed via Alembic. See ADR-014: Postgres as Primary Datastore for context on why Postgres was chosen and the accepted trade-off of manual schema migrations. Engineering — Incidents & Decisions.md


Related Pages

Made with CairniExplore public wikis →