RFC-0001: Hosting target for the web app and workers¶
- Status: Accepted — locked by ADR-0006 on 2026-04-28
- Author: MASTER (drafted by Claude as part of #34)
- Date: 2026-04-28
- Related: #19, ADR-0006, Architecture → Deployment, tech-stack reference
Decision recorded. Cloudflare Pages + Workers (OpenNext) + Neon + Upstash. See ADR-0006 for the locked decision; this RFC remains as the comparative analysis that produced it.
Summary¶
961tech needs a place for the Next.js 16 web app + the scraper workers to run in production. The docs site is already on Cloudflare Pages and we have a Cloudflare account; nothing else is decided. Recommendation: Cloudflare Pages (Workers + OpenNext adapter) for the web app, with workers running off Cron Triggers — best Beirut latency, cheapest managed path, account consolidation with the existing docs site, traded against OpenNext-adapter lag on bleeding-edge Next 16 features. Vercel is the safe fallback if the OpenNext adapter blocks a feature we actually need. Hetzner stays as the deferred-cost option for when margins matter.
Motivation¶
Architecture → Deployment is a stub page — the diagram and notes there are placeholder. #19 is the gating issue for the production hosting decision and is currently blocking:
- the M2 deploy timeline,
- prod-shaped env vars (every "Production" row in env-vars.md is "set via the hosting provider's UI (#19)"),
- the observability story (which provider's logs and metrics we lean on),
- the worker process model (where do scrape jobs actually run? — pairs with RFC-0002).
Not deciding has a real cost: every feature ticket carries an implicit "and here's how it'd run in prod" footnote that nobody resolves.
Proposal¶
Three options were researched in depth (Vercel + Neon + Upstash; Cloudflare Pages + Workers + Neon + Upstash; Hetzner CX22 self-managed). The proposal is option 2.
Recommendation: Cloudflare Pages + Workers (OpenNext adapter) + Neon Postgres + Upstash Redis¶
| What | Where |
|---|---|
| Next.js 16 web app | Cloudflare Pages, deployed via the @opennextjs/cloudflare adapter (GA as of late 2025) |
| Postgres | Neon (Frankfurt), accessed via Cloudflare Hyperdrive for connection pooling + edge cache |
| Redis | Upstash (REST API — ioredis does not work in Workers and would have to be replaced anyway) |
| Background workers | Cloudflare Cron Triggers invoking a Worker that drains the queue (see RFC-0002 for the runtime choice) |
| Docs site | Cloudflare Pages — already live, unchanged |
| DNS / TLS / WAF | Cloudflare — same zone, same dashboard |
Why¶
- Lowest Lebanon latency. Cloudflare has a Beirut PoP (BEY) — RTT ~10–30 ms. Vercel's nearest is Frankfurt (~80–120 ms). Hetzner is in Falkenstein/Nuremberg (~70–90 ms).
- Account consolidation. The docs site, app, DNS, and TLS all live in the same Cloudflare dashboard. One vendor, one bill, one set of credentials to rotate.
- Cost. Workers Paid is \(5/mo + Neon free tier (\)0 at M1) + Upstash free tier (\(0). Total ~\)5/mo at M1, scaling cheaply. Vercel Pro is \(20/mo (Hobby is non-commercial — disqualifying for an affiliate-monetised site). Hetzner is ~\)6/mo before the ops tax.
- Zero ops on TLS, DNS, observability basics. Same as Vercel, unlike Hetzner.
What changes from today¶
- Docs site already on Cloudflare Pages — no change.
- App moves from "no prod" to a Pages project under the same account.
- Postgres moves from local Docker to Neon Frankfurt (or any managed Postgres reachable from Workers — Neon is the cheapest entry).
- Redis moves from local Docker to Upstash. Note: this forces the worker layer off
ioredis— see Trade-offs.
Behaviour¶
graph TB
User[Visitor / Builder<br/>Lebanon] -->|HTTPS| CF[Cloudflare Edge<br/>BEY PoP]
CF -->|Worker invocation| App[Next.js on Workers<br/>OpenNext adapter]
App -->|Hyperdrive| PG[(Neon Postgres<br/>Frankfurt)]
App -->|REST| RD[(Upstash Redis)]
Cron[CF Cron Triggers<br/>scheduled UTC] -->|invokes| Worker[Scraper Worker<br/>same project, scheduled handler]
Worker --> PG
Worker --> Internet[Retailer sites]
DocsUser[Visitor] -.->|HTTPS| Docs[Docs site<br/>961tech.pages.dev]
Trade-offs¶
| Cost | What it buys |
|---|---|
| OpenNext-adapter lag. Next.js 16's bleeding-edge features land on Vercel first; the OpenNext Cloudflare adapter follows by weeks-to-months. New caching directives, PPR refinements, and image-optimization may require shims. | Beirut latency, cost, account consolidation. |
Code change to drop ioredis. Workers don't support raw TCP for ioredis reliably; Upstash Redis must be accessed via REST. Any BullMQ wiring would also need a Workers-compatible client. |
Forces the RFC-0002 decision earlier — but that was happening anyway. |
Workers runtime quirks. nodejs_compat flag opt-in; some Node APIs need shims; next/cache semantics on KV/R2 have eventual-consistency nuances. |
Cheapest way to run RSC + server actions at our scale. |
| No long-lived processes. No 24/7 BullMQ daemon. Workers are request-scoped; Cron Triggers are scheduled. Scrapers must complete within a Worker invocation budget (30s CPU on paid; total wall-clock time governed by subrequest limits). | Right-shaped for our 24 jobs/day. Wrong for streaming pipelines (we don't have any). |
Higher lock-in than Vercel. Worker bindings (env.X), KV usage, and OpenNext build artefacts mean migrating away requires testing against vanilla Node again. |
Cheapest migration target into (we're already on Cloudflare for docs). DB egress is trivially small. |
Alternatives¶
Vercel + Neon + Upstash¶
Fully managed. First-class Next.js 16 — every new feature lights up day-one. Fluid Compute (long-lived Node functions, 800s on Pro). No adapter friction. Workers run via Vercel Cron + an external worker host (Railway/Fly/tiny VPS) for any non-cron job.
- Cost at M1: ~$20/mo (Pro tier is required because Hobby disallows commercial use; the affiliate model qualifies as commercial). Next bend: bandwidth over 1 TB at $0.15/GB; Neon Launch at $19/mo when you outgrow free.
- Beirut latency: ~80–120 ms via FRA/CDG/LHR.
- When this wins: if MASTER's evening time and "Next.js features ship the day Vercel ships them" outweigh $15/mo. Also wins if a specific Next 16 feature (e.g., a future PPR refinement) doesn't yet work on OpenNext.
Hetzner CX22 self-managed (~$6/mo)¶
Single VPS, Docker compose: Next.js + Postgres 17 + Redis 7. Caddy/Traefik fronting for TLS. Cloudflare in front for CDN + WAF. Every Next.js feature works — it's just Node.
- Ops burden: highest. OS patches, Docker upgrades, PG backups (
pg_dumpto S3/B2 cron), Redis persistence, log rotation, uptime monitoring, incident response on weekends. Realistic 2–4 hrs/mo steady-state, more after any kernel CVE. - Lock-in: none. Pure portability.
- When this wins: if cost and control matter more than evenings, or if margins demand sub-$10 infra after the affiliate revenue picks up.
bits.lb beta¶
Lebanese hosting, politically aligned with the product. Beta — uptime SLO unknown, public docs sparse, runtime feature support unverified for Next 16.
- Treated as unknown in this RFC because public information is insufficient to evaluate. If MASTER has direct contact with bits.lb operators and wants to validate it as a production target, that's a separate research pass.
Open questions¶
These need MASTER input before this RFC moves to Accepted:
- Cost ceiling. Is the M1 hosting budget closer to $5, $20, or "whatever — pick the right tool"? The recommendation assumes $5–10/mo is a real constraint.
- Lebanese-host political weight. Is "Lebanese tool, Lebanese host" a marketing/principles requirement that vetoes Cloudflare/Vercel/Hetzner regardless of latency? If yes, this RFC restarts with bits.lb as the leading candidate.
- Ops appetite. Realistic question: are weekends-on-call acceptable as a fallback for prod ops, or is "I do not want to patch a VPS" a hard line? This decides whether Hetzner is even on the menu.
- Bleeding-edge feature dependency. Is there a Next 16 feature we know we'll need in M2 that's blocked on the OpenNext Cloudflare adapter? (Unknown today; if MASTER is planning streaming server actions or aggressive PPR, validate adapter support before signing off on Cloudflare.)
- Worker process model. Are we OK with all background work being scheduled (Cron Triggers + queue drain) rather than a long-lived consumer? Pairs with RFC-0002. At 24 jobs/day this is fine; if M3 introduces user-triggered async work (image fetcher, alert dispatch on save) that wants sub-minute latency, revisit.
Implementation plan¶
Once MASTER picks an option, the implementation work for #19 is roughly:
- Lock the hosting decision as an ADR (numbered
0004in sequence) - Update Architecture → Deployment — promote from
stubtoactive, replace the candidate table with the real shape - Provision the chosen environments (one production, one preview/staging)
- Move
DATABASE_URLandREDIS_URLto provider-managed equivalents (Neon connection string; Upstash REST URL or self-host endpoint) - Wire prod env vars (
IP_HASH_SECRETrotated to a real value; M2 vars per env-vars.md § Future) - Set up the docs+app GitHub Action (a
ci.ymlrunsvitest+tsc --noEmit+eslint; the docs workflow stays separate per current shape) - First deploy + smoke test (golden path: landing → /products → /products/[slug] → /build → click /api/go)
- Cutover docs site: no change (already on Cloudflare Pages)
- Update README + Local setup guide with the new prod URL
The migration is one-way per provider — picking Cloudflare and then later moving to Vercel is doable but has a real cost (OpenNext-specific code paths, Workers bindings). Best to pick deliberately.
Out of scope¶
- bits.lb deep-dive. Treated as unknown per Open Questions.
- Multi-region / failover. At Lebanese-market scale we have one region.
- CDN strategy for product images. Decoupled from app hosting; lives in its own future RFC alongside the image pipeline.
- Auth provider. #11 — separate decision.
- Email delivery. #14 — separate decision.
- Background-jobs runtime choice. RFC-0002. Hosting and runtime interact (Workers can't run BullMQ as a long-lived consumer), but the runtime is its own RFC.
- DR / RTO / RPO targets. Pulled along by the hosting choice; ADR after this RFC closes.