Skip to main content

CI/CD Runbook

Last reviewed: 2026-04-21
Maintained by: Engineering

This file is the operational guide for GitHub Actions and deploy troubleshooting.

Workflow Overview

The repo currently uses:

  • .github/workflows/ci.yml for validation
  • .github/workflows/deploy.yml for manual deployment promotion

Current delivery model:

  • feature branches merge into main
  • CI runs automatically on pushes to main and feature branches, and on pull requests to main
  • deploys do not happen automatically
  • Deploy is triggered manually with a target environment:
    • staging
    • production

What a Failure in ci.yml Means

Install, Typecheck, Lint

If this job fails, the most common causes are:

  • the lockfile and package.json are out of sync
  • a TypeScript error exists in the API or web app
  • an ESLint error exists

Local check:

pnpm install
pnpm run lint
pnpm run check:api
pnpm run check:manager-desk

Tests

If this job fails, the most common causes are:

  • a pure utility or configuration contract changed without updating tests
  • a new test file is failing under the Node test runner
  • a workspace test command was broken or removed

Local check:

pnpm test

Database Smoke Test

If this job fails, the problem is usually:

  • a new migration does not work on an empty database
  • the seed script is not aligned with the current schema
  • migrations are not idempotent when db:migrate is run again

Local check:

docker compose up -d postgres
pnpm run db:migrate
pnpm run seed:demo
pnpm run db:migrate

Migration Drift Check

If this job fails, it means the committed schema snapshot no longer matches the schema produced by the current migrations.

Typical causes:

  • a new migration was added but database/schema.snapshot.sql was not refreshed
  • a migration was edited after the snapshot was generated
  • the schema dump normalization logic needs to account for a new deterministic pg_dump output line

Local check:

docker compose up -d postgres
pnpm run db:migrate
pnpm run db:schema:check

What a Failure in deploy.yml Means

Staging Deploy

If the manual staging deploy fails:

  • the Vercel token or project config may be wrong
  • the Render service or environment configuration may be wrong
  • the target environment may be missing required secrets or vars
  • the environment contract may fail validation before the deploy steps run

Production Deploy

If the manual production deploy fails:

  • the same causes as staging apply
  • production may also have stricter values or approval requirements
  • the environment contract may fail validation before the deploy steps run

Manual Troubleshooting Order

When a deploy fails, go through this order:

  1. check whether the CI workflow was green
  2. check Validate environment contract inside the deploy job
  3. check whether the DB smoke test passed
  4. check GitHub Environment vars and secrets
  5. check Vercel project IDs and current API host configuration

QA Checklist Before Merging Into main

At minimum:

  • Install, Typecheck, Lint is green
  • Tests is green
  • Database Smoke Test is green
  • Migration Drift Check is green
  • if there are schema changes, migration and seed were tested
  • if there are schema changes, database/schema.snapshot.sql was refreshed
  • if there are deploy-affecting env changes, Staging and Production GitHub Environments were updated

Manual Promotion Flow

Use this order:

  1. finish work on a feature branch
  2. merge into main
  3. wait for CI to go green
  4. manually run Deploy with target staging
  5. test in staging
  6. manually run Deploy with target production

Deploy Assumptions

Vercel

The deploy workflow assumes:

  • valid VERCEL_TOKEN
  • valid VERCEL_ORG_ID
  • valid VERCEL_PROJECT_ID_MANAGER_DESK
  • apps/manager-desk is connected to the Manager Desk Vercel project

Render

The deploy workflow assumes:

  • Render is the intended API hosting target in the docs portal
  • hosted API configuration is present in the correct environment
  • any legacy Railway-specific workflow wiring should not be treated as the current hosting standard

When You Must Update Documentation

If you change:

  • CI jobs
  • branch flow
  • deployment targets
  • required environment fields

then also update: