Skip to content

RFC-0001: Hosting target for the web app and workers

Decision recorded. Cloudflare Pages + Workers (OpenNext) + Neon + Upstash. See ADR-0006 for the locked decision; this RFC remains as the comparative analysis that produced it.

Summary

961tech needs a place for the Next.js 16 web app + the scraper workers to run in production. The docs site is already on Cloudflare Pages and we have a Cloudflare account; nothing else is decided. Recommendation: Cloudflare Pages (Workers + OpenNext adapter) for the web app, with workers running off Cron Triggers — best Beirut latency, cheapest managed path, account consolidation with the existing docs site, traded against OpenNext-adapter lag on bleeding-edge Next 16 features. Vercel is the safe fallback if the OpenNext adapter blocks a feature we actually need. Hetzner stays as the deferred-cost option for when margins matter.

Motivation

Architecture → Deployment is a stub page — the diagram and notes there are placeholder. #19 is the gating issue for the production hosting decision and is currently blocking:

  • the M2 deploy timeline,
  • prod-shaped env vars (every "Production" row in env-vars.md is "set via the hosting provider's UI (#19)"),
  • the observability story (which provider's logs and metrics we lean on),
  • the worker process model (where do scrape jobs actually run? — pairs with RFC-0002).

Not deciding has a real cost: every feature ticket carries an implicit "and here's how it'd run in prod" footnote that nobody resolves.

Proposal

Three options were researched in depth (Vercel + Neon + Upstash; Cloudflare Pages + Workers + Neon + Upstash; Hetzner CX22 self-managed). The proposal is option 2.

Recommendation: Cloudflare Pages + Workers (OpenNext adapter) + Neon Postgres + Upstash Redis

What Where
Next.js 16 web app Cloudflare Pages, deployed via the @opennextjs/cloudflare adapter (GA as of late 2025)
Postgres Neon (Frankfurt), accessed via Cloudflare Hyperdrive for connection pooling + edge cache
Redis Upstash (REST API — ioredis does not work in Workers and would have to be replaced anyway)
Background workers Cloudflare Cron Triggers invoking a Worker that drains the queue (see RFC-0002 for the runtime choice)
Docs site Cloudflare Pages — already live, unchanged
DNS / TLS / WAF Cloudflare — same zone, same dashboard

Why

  • Lowest Lebanon latency. Cloudflare has a Beirut PoP (BEY) — RTT ~10–30 ms. Vercel's nearest is Frankfurt (~80–120 ms). Hetzner is in Falkenstein/Nuremberg (~70–90 ms).
  • Account consolidation. The docs site, app, DNS, and TLS all live in the same Cloudflare dashboard. One vendor, one bill, one set of credentials to rotate.
  • Cost. Workers Paid is \(5/mo + Neon free tier (\)0 at M1) + Upstash free tier (\(0). Total ~\)5/mo at M1, scaling cheaply. Vercel Pro is \(20/mo (Hobby is non-commercial — disqualifying for an affiliate-monetised site). Hetzner is ~\)6/mo before the ops tax.
  • Zero ops on TLS, DNS, observability basics. Same as Vercel, unlike Hetzner.

What changes from today

  • Docs site already on Cloudflare Pages — no change.
  • App moves from "no prod" to a Pages project under the same account.
  • Postgres moves from local Docker to Neon Frankfurt (or any managed Postgres reachable from Workers — Neon is the cheapest entry).
  • Redis moves from local Docker to Upstash. Note: this forces the worker layer off ioredis — see Trade-offs.

Behaviour

graph TB
    User[Visitor / Builder<br/>Lebanon] -->|HTTPS| CF[Cloudflare Edge<br/>BEY PoP]
    CF -->|Worker invocation| App[Next.js on Workers<br/>OpenNext adapter]
    App -->|Hyperdrive| PG[(Neon Postgres<br/>Frankfurt)]
    App -->|REST| RD[(Upstash Redis)]

    Cron[CF Cron Triggers<br/>scheduled UTC] -->|invokes| Worker[Scraper Worker<br/>same project, scheduled handler]
    Worker --> PG
    Worker --> Internet[Retailer sites]

    DocsUser[Visitor] -.->|HTTPS| Docs[Docs site<br/>961tech.pages.dev]

Trade-offs

Cost What it buys
OpenNext-adapter lag. Next.js 16's bleeding-edge features land on Vercel first; the OpenNext Cloudflare adapter follows by weeks-to-months. New caching directives, PPR refinements, and image-optimization may require shims. Beirut latency, cost, account consolidation.
Code change to drop ioredis. Workers don't support raw TCP for ioredis reliably; Upstash Redis must be accessed via REST. Any BullMQ wiring would also need a Workers-compatible client. Forces the RFC-0002 decision earlier — but that was happening anyway.
Workers runtime quirks. nodejs_compat flag opt-in; some Node APIs need shims; next/cache semantics on KV/R2 have eventual-consistency nuances. Cheapest way to run RSC + server actions at our scale.
No long-lived processes. No 24/7 BullMQ daemon. Workers are request-scoped; Cron Triggers are scheduled. Scrapers must complete within a Worker invocation budget (30s CPU on paid; total wall-clock time governed by subrequest limits). Right-shaped for our 24 jobs/day. Wrong for streaming pipelines (we don't have any).
Higher lock-in than Vercel. Worker bindings (env.X), KV usage, and OpenNext build artefacts mean migrating away requires testing against vanilla Node again. Cheapest migration target into (we're already on Cloudflare for docs). DB egress is trivially small.

Alternatives

Vercel + Neon + Upstash

Fully managed. First-class Next.js 16 — every new feature lights up day-one. Fluid Compute (long-lived Node functions, 800s on Pro). No adapter friction. Workers run via Vercel Cron + an external worker host (Railway/Fly/tiny VPS) for any non-cron job.

  • Cost at M1: ~$20/mo (Pro tier is required because Hobby disallows commercial use; the affiliate model qualifies as commercial). Next bend: bandwidth over 1 TB at $0.15/GB; Neon Launch at $19/mo when you outgrow free.
  • Beirut latency: ~80–120 ms via FRA/CDG/LHR.
  • When this wins: if MASTER's evening time and "Next.js features ship the day Vercel ships them" outweigh $15/mo. Also wins if a specific Next 16 feature (e.g., a future PPR refinement) doesn't yet work on OpenNext.

Hetzner CX22 self-managed (~$6/mo)

Single VPS, Docker compose: Next.js + Postgres 17 + Redis 7. Caddy/Traefik fronting for TLS. Cloudflare in front for CDN + WAF. Every Next.js feature works — it's just Node.

  • Ops burden: highest. OS patches, Docker upgrades, PG backups (pg_dump to S3/B2 cron), Redis persistence, log rotation, uptime monitoring, incident response on weekends. Realistic 2–4 hrs/mo steady-state, more after any kernel CVE.
  • Lock-in: none. Pure portability.
  • When this wins: if cost and control matter more than evenings, or if margins demand sub-$10 infra after the affiliate revenue picks up.

bits.lb beta

Lebanese hosting, politically aligned with the product. Beta — uptime SLO unknown, public docs sparse, runtime feature support unverified for Next 16.

  • Treated as unknown in this RFC because public information is insufficient to evaluate. If MASTER has direct contact with bits.lb operators and wants to validate it as a production target, that's a separate research pass.

Open questions

These need MASTER input before this RFC moves to Accepted:

  1. Cost ceiling. Is the M1 hosting budget closer to $5, $20, or "whatever — pick the right tool"? The recommendation assumes $5–10/mo is a real constraint.
  2. Lebanese-host political weight. Is "Lebanese tool, Lebanese host" a marketing/principles requirement that vetoes Cloudflare/Vercel/Hetzner regardless of latency? If yes, this RFC restarts with bits.lb as the leading candidate.
  3. Ops appetite. Realistic question: are weekends-on-call acceptable as a fallback for prod ops, or is "I do not want to patch a VPS" a hard line? This decides whether Hetzner is even on the menu.
  4. Bleeding-edge feature dependency. Is there a Next 16 feature we know we'll need in M2 that's blocked on the OpenNext Cloudflare adapter? (Unknown today; if MASTER is planning streaming server actions or aggressive PPR, validate adapter support before signing off on Cloudflare.)
  5. Worker process model. Are we OK with all background work being scheduled (Cron Triggers + queue drain) rather than a long-lived consumer? Pairs with RFC-0002. At 24 jobs/day this is fine; if M3 introduces user-triggered async work (image fetcher, alert dispatch on save) that wants sub-minute latency, revisit.

Implementation plan

Once MASTER picks an option, the implementation work for #19 is roughly:

  • Lock the hosting decision as an ADR (numbered 0004 in sequence)
  • Update Architecture → Deployment — promote from stub to active, replace the candidate table with the real shape
  • Provision the chosen environments (one production, one preview/staging)
  • Move DATABASE_URL and REDIS_URL to provider-managed equivalents (Neon connection string; Upstash REST URL or self-host endpoint)
  • Wire prod env vars (IP_HASH_SECRET rotated to a real value; M2 vars per env-vars.md § Future)
  • Set up the docs+app GitHub Action (a ci.yml runs vitest + tsc --noEmit + eslint; the docs workflow stays separate per current shape)
  • First deploy + smoke test (golden path: landing → /products → /products/[slug] → /build → click /api/go)
  • Cutover docs site: no change (already on Cloudflare Pages)
  • Update README + Local setup guide with the new prod URL

The migration is one-way per provider — picking Cloudflare and then later moving to Vercel is doable but has a real cost (OpenNext-specific code paths, Workers bindings). Best to pick deliberately.

Out of scope

  • bits.lb deep-dive. Treated as unknown per Open Questions.
  • Multi-region / failover. At Lebanese-market scale we have one region.
  • CDN strategy for product images. Decoupled from app hosting; lives in its own future RFC alongside the image pipeline.
  • Auth provider. #11 — separate decision.
  • Email delivery. #14 — separate decision.
  • Background-jobs runtime choice. RFC-0002. Hosting and runtime interact (Workers can't run BullMQ as a long-lived consumer), but the runtime is its own RFC.
  • DR / RTO / RPO targets. Pulled along by the hosting choice; ADR after this RFC closes.