Skip to content

RFC-0006: Security controls

Decision gate. This RFC enumerates the security decisions that need MASTER's call before #11 (Auth.js) and #16 (admin) can be scoped. Once MASTER picks, the choices land as one or more ADRs (likely 0006, 0007, …) and the relevant baseline rows flip from proposed to decided. The threat model and security baseline ship alongside this RFC as the foundation.

Summary

The threat model surfaces concrete controls, but five of them require a product-level call MASTER hasn't made: (1) how aggressively to harden Auth.js v5 beyond canonical defaults; (2) the CSP style-src posture given shadcn/ui's Radix portals inject inline styles; (3) whether rate limiting is Cloudflare WAF only or also application-side per-(ipHash, listingId); (4) how long to keep audit_log rows; (5) vulnerability disclosure surface (security.txt + email vs Bugcrowd vs nothing). A sixth implicit decision — accepting any Cloudflare paid-tier features (Super Bot Fight Mode at Pro, Bot Management at Enterprise, additional Rate Limiting Rules at Pro) — runs through several of the questions and is called out separately. Recommendation across the slate: take the cheap, low-friction default everywhere except CSP style-src (where the trade-off depends on Radix component picks RFC-0004 hasn't locked) and rate-limit topology (where the #17 CPC payout decision changes the answer). Each open question below has options, a recommendation, and the reasoning.

Motivation

Foundation security ticket #44 produced a STRIDE-per-surface threat model and a baseline list of ~50 controls. Most of them are unambiguous: rotate IP_HASH_SECRET, use Workers Secrets, switch from XFF to CF-Connecting-IP, gate npm audit in CI. A handful are not unambiguous because they trade off cost, UX friction, ops burden, or product surface against security strength. Those decisions must be MASTER's, not Claude's. This RFC lists them.

Not deciding has a real cost: every M2 ticket that touches auth or admin will re-ask "what's the auth.js stance?" and "what does our CSP allow?" — better to land the decisions once.

Open questions

Q1 — Auth.js v5: stick with canonical defaults, or harden specific flows?

Context. Auth.js v5 (the renamed-from-NextAuth.js project — verified at authjs.dev) ships secure defaults: allowDangerousEmailAccountLinking: false, checks: ["pkce", "state"] for OAuth, JWE-encrypted JWT in __Secure- HttpOnly cookies, SameSite=Lax, automatic new-session-ID-on-sign-in. The threat model's Surface C identifies the email-collision merge attack as the dominant Auth.js threat — already closed by the v5 default plus an explicit callbacks.signIn rejecting unverified emails (SB-040, SB-041).

Options:

  • Option A — Canonical only. Use Auth.js v5 defaults; no extra hardening. Lowest dev burden; fastest to ship. Residual risk: a verified-email-on-attacker-provider attack (rare); JWT non-revocation (mitigated only by AUTH_SECRET rotation = invalidate-all).
  • Option B — Canonical + targeted hardening. Defaults plus: explicit allowDangerousEmailAccountLinking: false per provider (belt-and-braces); callbacks.signIn rejects unverified emails; events.signIn/signOut/linkAccount wired to audit_log; least-privilege scopes (openid email profile / read:user user:email); custom pages.error to suppress verbose errors; session.maxAge 7 days (vs 30-day default); admin authz defence-in-depth (middleware and auth() check inside every admin server component).
  • Option C — Maximal hardening. Option B plus: DB-strategy sessions (allows per-user revocation, but adds DB roundtrip per request — friction on Cloudflare Workers); short JWT TTL with rotating refresh tokens; mandatory 2FA at the OAuth provider for admins.

Recommendation: Option B. The added burden over A is small (~1 day of config + audit-log wiring) and closes the meaningful residual risks (E2 admin authz drift, S2 email-collision) without the JWT-vs-DB-strategy debate. Option C's DB sessions add a round-trip per request that makes Cloudflare Workers economics worse and isn't worth it at hobby scale. Mandatory 2FA at provider is unenforceable from our side anyway.

Q2 — CSP: which directives, and what's the style-src posture?

Context. The threat model's Surface A identifies CSP as the dominant XSS containment lever. The script-src directive is straightforward — per-request nonce + 'strict-dynamic' is the production-grade pattern, generated in middleware and read in the root layout. The style-src directive is the trade-off: Radix primitives (used by shadcn/ui per RFC-0004) inject inline style="..." for Popover, Tooltip, Dialog positioning via @floating-ui. Radix does not currently expose a nonce prop on its portals. So style-src is either 'unsafe-inline' or per-render style-nonce that Radix portals can't honour today.

Options:

  • Option A — style-src 'self' 'unsafe-inline'. Accept inline styles. script-src stays nonce-locked. XSS-via-style is bounded — an attacker who lands an XSS can change colours / layout / expose hidden elements but cannot exfiltrate cookies the way inline scripts can.
  • Option B — style-src 'self' 'nonce-{NONCE}' plus monkey-patching Radix to apply nonces. Stronger; significant maintenance debt; brittle across Radix upgrades. Not recommended for a solo project.
  • Option C — Skip the Radix Popover/Tooltip/Dialog primitives that need positioning, build our own with CSS-only. Loses shadcn's value-add. Forfeits accessibility wins. Not recommended.
  • Option D — Wait until Radix ships a nonce API. Track upstream. Land Option A as the interim, flip to nonce-only when a Radix release supports it.

Recommendation: Option D (Option A as interim). Land style-src 'self' 'unsafe-inline' with a documented residual risk and a // TODO(security): tighten style-src once Radix exposes a nonce prop comment in next.config.ts. Track upstream Radix issues; revisit at the next quarterly stack review (SB-082). script-src stays nonce-locked the whole time.

Concrete starting CSP:

default-src 'self';
script-src 'self' 'strict-dynamic' 'nonce-{REQUEST_NONCE}';
style-src 'self' 'unsafe-inline'; /* Radix portal limitation; tighten later */
img-src 'self' data: <retailer-cdn-allow-list>;
font-src 'self';
connect-src 'self';
frame-ancestors 'none';
form-action 'self';
base-uri 'none';
object-src 'none';
upgrade-insecure-requests;

Run through the Google CSP Evaluator before committing the final shape; the directive set above is the starting point, not the locked version.

Q3 — Rate limiting: Cloudflare WAF only, or also application-side?

Context. Cloudflare Rate Limiting Rules free tier permits 1 rule (IP-based, 10-second window) — verified Apr 2026. Pro permits 2 rules with broader fields. Application-side per-(ipHash, listingId) token bucket addresses targeted-listing click inflation that gross IP-bucket misses but adds DB writes, ops complexity, and a counter-storage decision (Postgres / Workers KV / Durable Objects).

Options:

  • Option A — CF WAF Rate Limiting only. One free rule scoped to /api/* POST + /api/go/* + /api/auth/* at 60 req/min/IP, action = managed challenge. Plus Bot Fight Mode (free, zone-wide). No application-side bucket. Lowest complexity; lowest friction.
  • Option B — CF WAF + app-side per-IP-per-listing token bucket on Postgres. Adds a counter row + 1-min decay on pg-boss. Defends targeted-inflation. Adds DB write per click.
  • Option C — CF WAF + Workers Durable Objects token bucket. Sub-second consistency; cleaner architectural fit; but Durable Objects is paid past free tier and adds CF-binding complexity.

Recommendation: Option A for M1; revisit at the #17 implementation moment. Without CPC payouts attached to clicks, "metric corruption" is the worst case — annoying, not stealing money. Option A is sufficient for that. The moment third-party CPC payouts attach to clicks, the threat upgrades to direct theft and Option B (Postgres-backed bucket — cheapest of the two) becomes mandatory. Coupling the upgrade to #17 keeps the M1 surface lean.

Q4 — Audit-log retention: how long, where?

Context. SB-018 writes one audit_log row per admin action (vendor CRUD, listing edit, override) and per auth event (signIn, linkAccount). At hobby scale, the table stays small. At M2+ scale with co-admins and frequent listing edits, it grows. Retention is a privacy trade-off (longer = better forensics; shorter = lower data-breach blast radius) and a cost trade-off (Postgres storage on Neon).

Options:

  • Option A — Forever. No retention policy. Simplest; smallest at M1; eventually a cost item.
  • Option B — 1 year hot (Postgres) + cold archive to R2 indefinitely. Operational rows queryable; older rows archived for forensics. Adds an archival job to pg-boss.
  • Option C — 90 days hot, no archive. Aggressive deletion. Forensic capacity limited to 90 days of timeline.
  • Option D — Tiered by event class. Auth events 1 year; admin mutations forever; routine queries 90 days.

Recommendation: Option A for M1 (forever, in Postgres). The table is genuinely tiny at hobby scale (10s of rows/day). When the table crosses ~1 GB or when #40 legal/privacy lands a retention obligation, transition to Option B. Don't pre-engineer archival before there's data to archive.

Q5 — Vulnerability disclosure: how do we want to be reported to?

Context. A public-facing aggregator that handles affiliate clicks and (M2) user accounts will eventually receive security reports — either responsibly via a researcher with security.txt, or irresponsibly via a public tweet / Lebanon-tech-Twitter callout, or via a bug-bounty platform if we sign up for one.

Options:

  • Option A — Nothing. Reports come in via random channels (GitHub issue, Twitter DM, MASTER's email). No SLA, no triage. Realistic for hobby budget; bad PR exposure on the day a real CVE lands.
  • Option B — security.txt with email. /.well-known/security.txt lists security@961.tech (or MASTER's existing email) and a 90-day expiry. Cheapest responsible-disclosure surface; one mailbox to monitor.
  • Option C — Bugcrowd / HackerOne. Managed triage. Costs money (HackerOne Lite is $99/mo+; Bugcrowd's hobby tier varies) and attracts low-quality reports unless we set scope tightly.
  • Option D — security.txt + GPG key + minimal hall-of-fame. Option B plus a public acknowledgement page. Encourages responsible reporters; still hobby-budget.

Recommendation: Option D. security.txt + GPG-key-on-keyserver + a one-line credits page on the public site. Cost ≈ $0; responsible disclosure surface present from launch. Skip Bugcrowd until the affiliate revenue model produces actual money worth defending. SB-081 tracks the implementation.

Q6 (implicit) — Cloudflare paid features: which costs do we accept?

Context. Several controls reference Cloudflare paid tiers. The free plan covers most of what we need; the upgrades are real but not urgent.

Feature Tier Cost Buys
Bot Fight Mode Free $0 Basic bot blocking, zone-wide on/off
Rate Limiting Rules (1 rule, 10s IP) Free $0 Coarse rate limit on one path-shape
Rate Limiting Rules (2 rules, broader fields) Pro $20/mo Separate /api/auth/* from /api/go/*
Super Bot Fight Mode Pro/Business $20/mo Better headless / "likely automated" classification
Bot Management Enterprise quote ML-based bot scoring; not for hobby budgets
Turnstile Free $0 CAPTCHA-replacement on forms; fine on Free
Workers Paid $5/mo $5/mo Higher invocation limits — separately covered by RFC-0001
Hyperdrive Free + Paid usage varies Connection pooling — separately covered by RFC-0001

Recommendation: stay on Free for security-specific features at M1. Bot Fight Mode + the single Rate Limiting Rule + Turnstile cover the ground. Upgrade to Pro ($20/mo) when traffic justifies the second rate-limit rule or when Super Bot Fight Mode's better classification is needed. Bot Management is Enterprise — out of scope. Stop and ask is the trigger if a security control discovered during implementation requires Pro+ tier.

Trade-offs

Cost What it buys
Q1 Option B: ~1 day of dev work to wire events.signInaudit_log, custom error page, callbacks.signIn email-verified gate. Closes the email-collision merge attack and the "OAuth callback didn't sign in but we don't know who tried" repudiation gap.
Q2 Option D interim Option A: residual XSS-via-style if any inline-style XSS landing surface exists. Ships shadcn/Radix without forking the library; tightens later via Radix upstream.
Q3 Option A: targeted-listing inflation possible at M1. Avoids per-click DB write cost and the counter-storage decision until #17 makes it mandatory.
Q4 Option A: indefinite audit_log growth. No archival code to write at M1; flip to Option B when the table actually grows.
Q5 Option D: small ongoing inbox burden (rare emails; some spam). Responsible-disclosure surface exists from launch.
Q6 stay on Free: residual headless-bot click inflation (closes only with Super Bot Fight Mode at Pro). Saves $20/mo until traffic justifies the upgrade.

Alternatives

The Q-by-Q sections above carry the alternatives. The systemic alternative — "do nothing until an incident forces our hand" — is rejected because (a) IP_HASH_SECRET defaulting to a literal in source is not a "wait for incident" item, (b) the Auth.js + admin work is M2 and security cannot be backfilled into auth flows after they ship, © security.txt is one file and zero ongoing cost.

Open questions (consolidated)

  1. Q1 — Auth.js scope. Recommendation: Option B (canonical + targeted hardening).
  2. Q2 — CSP style-src. Recommendation: Option D — interim 'unsafe-inline', tighten when Radix supports nonces.
  3. Q3 — Rate-limit topology. Recommendation: Option A (CF WAF only) for M1; Option B at #17 implementation.
  4. Q4 — Audit-log retention. Recommendation: Option A (forever) for M1; flip to B when table grows or #40 lands an obligation.
  5. Q5 — Vulnerability disclosure. Recommendation: Option D (security.txt + GPG + credits page).
  6. Q6 — Cloudflare paid features. Recommendation: stay on Free for security-specific features at M1; revisit on traffic.

Plus three meta-questions:

  • Should this RFC produce one ADR (0006: security controls posture) or one per question? Recommendation: one combined ADR, since the questions are closely related and reading them together is easier than navigating six separate decisions.
  • What's the cadence for re-evaluating these decisions? Recommendation: pair with the quarterly stack review (SB-082).
  • Does #40 legal/privacy work change any of these? Recommendation: re-open Q4 and Q5 once #40 lands data-breach-notification obligations. Q1 (Auth.js) and Q2 (CSP) are independent.

Implementation plan

Once MASTER picks (or amends) the recommendations, the implementation work splits across the existing tickets that own each surface — this RFC does NOT spawn an "implement security baseline" mega-ticket. The baseline rows have tiers; let the tier drive the ticket:

  • Lock the decisions as ADR-0006: Security controls posture (single combined ADR per recommendation above).
  • M1 must-haves (file as discrete issues if not already on the board):
  • SB-002 — rotate IP_HASH_SECRET (1-line change + secret rotation in CF dashboard; depends on #19 hosting pick).
  • SB-021 — switch click-redirect to CF-Connecting-IP (one-file change; not blocked).
  • SB-003 + SB-004 — CSP + headers in next.config.ts (one-file change; nonce middleware adds a second file).
  • SB-016 — Cloudflare Rate Limiting Rule (CF dashboard config; documented in a runbook).
  • SB-080npm audit in ci.yml (paired with the ci.yml work flagged in tech-stack.md § Gaps).
  • SB-081security.txt (one file under public/.well-known/).
  • SB-067 — soft-stale state on Listing (already partially in scrapers; verify).
  • M2 must-haves — fold into #11 (Auth.js), #16 (admin), #14 (alerts), #21 (LLM), #17 (affiliate). Each ticket's spec must reference the relevant baseline row IDs in its acceptance criteria.
  • Update Reference → Security baseline statuses from proposed to decided once ADR-0006 is locked.

Out of scope

  • Per-CVE response runbook (#43 observability).
  • DR / RTO / RPO (RFC-0001 hosting decision pulls these).
  • 24/7 SOC, paid pentesting, dedicated security team — explicitly rejected at hobby budget.
  • M3 retailer self-onboarding (UC-K) threat surface — deferred until #36 re-opens it.
  • Legal data-breach notification flow — owned by #40; this RFC's Q4 and Q5 are coupled but the obligation itself is upstream.
  • Bug-bounty platform sign-up (Bugcrowd / HackerOne) — Q5 Option C is out of scope for hobby budget; revisit when affiliate revenue is real.