ADR-0007: Background-jobs runtime is pg-boss on the existing Postgres¶
- Status: Accepted
- Date: 2026-04-28
- Deciders: MASTER
- Related: #18, RFC-0002, #34, tech-stack, ADR-0006
Context¶
RFC-0002 compared three runtimes for scheduled + on-demand background work: BullMQ on Redis, pg-boss on the existing Postgres, and GitHub Actions cron. The current state was a code smell: bullmq ^5.76.2 and ioredis ^5.10.1 listed in package.json, zero imports in src/, and a Redis container running in docker-compose.yml only for that queue use.
Workload at M1: ~24 scrape jobs/day; M2 estimate: ~120 jobs/day with future on-demand work (price-drop alerts in #14, image fetcher, LLM extraction in #21).
Decision¶
Use pg-boss on the existing Postgres for all scheduled and on-demand background work. Drop bullmq, ioredis, and the Redis container.
The runtime works under all three ADR-0006 hosting shapes (Cloudflare Cron Triggers + scheduled drain, Vercel Cron + drain, Hetzner systemd unit) — pg-boss is just SQL.
Consequences¶
Positive¶
- No new stateful infra. We already operate Postgres; we don't add Redis as production-critical infrastructure.
- Transactionally consistent with data writes. A scrape can
UPDATE listings ...and mark its job complete in one transaction, eliminating "data written, job not marked complete" failure modes. - Right-shaped for our scale. 24–120 jobs/day fits poll-based queueing trivially. Default 2s poll latency is invisible.
- Cost. Zero. No Upstash dependency, no Redis container, one less prod secret (
REDIS_URLdrops from env-vars.md). - Migration target if scale grows — re-implementing the queue glue (
src/lib/jobs.ts) for BullMQ is ~1 day of work; the job bodies don't change.
Negative¶
- No shipped dashboard. We write a small
/admin/jobspage readingpgboss.job(M2 polish) or live withpsqlqueries until then. - Polling load on Postgres. Default 2s poll. Invisible at our scale; would matter at 1000s of jobs/sec.
- Smaller ecosystem than BullMQ. Fewer Stack Overflow answers, fewer plugins. API surface is small (
boss.send,boss.work,boss.schedule) — manageable. - Polling latency up to 2s from enqueue to start. Acceptable for daily scrapes and price-drop dispatch; would matter for a streaming pipeline (we don't have any).
Neutral¶
- Queue contention on the same DB. At 1000s of jobs/sec this is real. Not real at 24–120/day. Migration trigger documented below.
Migration triggers — pre-recorded¶
Revisit (and probably switch to BullMQ) if any of these fire:
- Scrape volume or fanout grows by 10× (>1,200 jobs/day).
- A new feature wants sub-second per-listing handoff (e.g. #21 LLM-extraction-on-scrape with concurrency).
- A new feature wants user-triggered async work with sub-100ms latency expectations (none in M1/M2 scope).
Recording these in advance so we don't drift past the threshold without noticing.
Alternatives considered¶
BullMQ on existing Redis — rejected for now¶
Pre-installed answer; mature; sub-second pub/sub latency; @bull-board UI ships. Loses on: introduces Redis as production-critical infra for a workload that doesn't need it. Visibility-timeout / stalled-job semantics are subtle (bites people who write long-running jobs without heartbeats). At 24 jobs/day this is a sledgehammer. Stays as the named migration target.
GitHub Actions scheduled workflows — rejected¶
Zero infra to operate; free on public repos. Loses on the deal-breaker: no queue. Price-drop alerts and image-fetcher are app-triggered, not cron-only. GH Actions also has cron drift, silent skip after 60-day inactivity, and whole-job retries only. Worth keeping as a belt-and-braces watcher (heartbeat cron alerting if no recent scrape ran) — separate ticket, not primary runtime.
Decisions on the open questions from RFC-0002¶
- Semantics for scrape jobs: at-least-once. Listing upsert is keyed by
(retailerId, url)(per Prisma models), so duplicated scrapes are idempotent at the DB level. - Job state retention: pg-boss defaults — 7 days for completed, 14 for failed. Tunable later.
- Admin dashboard timing: M2 polish. Until then,
psqlqueries againstpgboss.jobare documented in a runbook (filed as a separate ticket). - Drop deps timing: in #18's implementation PR — keeps the audit + RFCs on this commit, code change on its own commit.
References¶
- RFC-0002 — Background-jobs runtime
- Tech stack reference
- pg-boss
- ADR-0006 — Hosting target (worker process model)