Issue #43: KPIs, observability, performance budget¶
- Issue: #43
- Started: 2026-04-28
- Completed: 2026-04-28
Goal¶
Three new docs that lock the M1 measurement story:
- A KPI reference at
docs/reference/kpis.mdordering every metric by tier (north-star → secondary → health), each with its source, target, and cadence. - A decision-time RFC at
docs/rfc/0007-observability-stack.mdpicking the log shipper, error reporter, metrics layer, alerting destination, and dashboard surface — under a $20/mo M1 ceiling, constrained by RFC-0001 Cloudflare hosting. - A performance budget reference at
docs/reference/performance-budget.mdwith per-surface (frontend, backend, scraper, cost) numeric budgets, each anchored in a persona, retailer, or RFC fact.
After this lands, #28 page design, #19 hosting, #18 scraper queue, #14 alerts, #29 DB, and #44 security all have a measurement contract to design against.
Out of scope¶
- Implementation. No code, no
package.jsonchanges, no wrangler config. The RFC describes the chosen stack; wiring it up is its own ticket after #19 lands. - Telegram bridge. MASTER's existing
telegram-notifyMCP is intentionally not wired in by default — alerts default to Cloudflare email Notifications. Telegram is an opt-in MASTER decision, surfaced as an open question. - Long-form analytics product. No funnels, retention cohorts, or segmentation tooling at M1. Cloudflare Web Analytics + Workers Analytics Engine cover the north-star metric without a dedicated product-analytics SaaS.
- APM / distributed tracing. Deferred to M2+. Cloudflare Workers'
performance.now()zeroing on CPU-bound spans makes server-side traces low-signal until tracing matures.
Approach¶
Decision-time RFC, not exploratory. The constraints — Cloudflare hosting per RFC-0001, $5–20/mo budget, mobile-Lebanese audience, solo evening project, "ship M1 now" mindset — collapse the option space tightly. Five vendor options were researched in parallel via superpowers:dispatching-parallel-agents; verdicts converged on a Cloudflare-native primary stack with Sentry Free + Axiom Free as zero-cost augments. KPI tiering follows the standard north-star → secondary → health pyramid, anchored in personas §5.6 wedge pains and §6.3 LTV signals. Performance budget anchors every number in a persona, retailer fact, or RFC constraint — no Lighthouse-default cargo-culting.
Steps¶
- Step 1: Read context.
- Issue #43 body (via local docs since
ghis unauthenticated on this box),personas.md§5.2 + §5.6 + §7,competitive-landscape.md§3.7 (craft baseline) + §4.4 (Lebanese retailers),tech-stack.md(no observability today),retailers.md(scraper context),RFC-0001(Cloudflare Pages + Workers),RFC-0002(pg-boss),architecture/deployment.md,architecture/ingest-pipeline.md. -
Verification: scope crystallized into the three deliverables above.
-
Step 2: Dispatch parallel vendor research.
- Five subagents in one message, each researching one option with current 2026 pricing pages:
- Sentry — error tracking, Cloudflare Workers SDK, free-tier specifics
- Highlight.io vs PostHog — all-in-one observability + product analytics
- Plausible vs Cloudflare Web Analytics — pageviews + RUM Web Vitals
- Cloudflare-native primitives — Workers Observability, Logs, Analytics Engine, Logpush, Web Analytics, Notifications
- Log shipping destinations — Axiom, Better Stack, Datadog, R2 + grep, Baselime/Cloudflare-acquired, Grafana Loki
-
Verification: all five returned with verified-from-live-page pricing + flags for stale claims.
-
Step 3: Pick the stack and write the RFC.
- Convergence: Cloudflare-native primary (Web Analytics + Workers Observability + Analytics Engine) + Sentry Free for error grouping + Axiom Free for 30-day log retention. Total incremental observability spend = $0 above the $5/mo Workers Paid baseline already in RFC-0001.
-
Verification:
docs/rfc/0007-observability-stack.mddrafted with full alternatives table, trade-offs, and open questions. -
Step 4: Write the KPI reference.
- North-star: Weekly Qualified Outbound Clicks (WQOC) — distinct visitors per ISO week who clicked through to a retailer via
/api/go/.... Defensible because the click is the moment the aggregator delivers value; everything before is intent. - Secondary tracks (Builder, Casual, Aggregator-core, Health) per persona-track and ticket scope.
-
Verification:
docs/reference/kpis.mdreads top-to-bottom, every metric has source + target + cadence. -
Step 5: Write the performance budget.
- Lebanese mobile reality (personas §5.2): mobile-first non-negotiable, 4G + occasional generator outages, image weight matters acutely.
- Per-surface targets: frontend Core Web Vitals (LCP < 2.5s P75, INP < 200ms P75, CLS < 0.1, TTFB < 600ms) + bundle (initial JS < 100KB gz, hero img < 200KB) + backend (product-list < 100ms p95, search < 300ms p95, clickout 302 < 50ms p95) + scraper (full-roster < 30min) + cost (< $20/mo all-in M1).
-
Verification:
docs/reference/performance-budget.mdevery number has a rationale citation. -
Step 6: Cross-link.
- Update
docs/reference/index.md(add kpis + performance-budget rows). - Update
docs/rfc/index.md(add RFC-0007 row). - Update
docs/plans/index.md(add this plan row). -
Verification: all three index updates land in the same commit.
-
Step 7: Build docs strict + commit.
mkdocs build --strictto catch broken refs.- One atomic commit
docs(foundation): kpis + observability stack + perf budget (#43). No push, no PR.
Risks¶
| Risk | Likelihood | Mitigation |
|---|---|---|
| RFC numbering collision (parallel worktrees may also be authoring 0005/0006) | medium | Used 0007 per ticket prompt instruction. If a collision surfaces, renumber in a follow-up commit before merge — the slug is unique. |
| Vendor pricing drifts before MASTER reads the RFC | low | Every pricing claim flagged with verification date (2026-04-28) and source URL. Re-spot-check before signoff if delayed >30 days. |
| Cloudflare Analytics Engine billing flips on (currently "you will not be billed" disclaimer in docs) | low | Cost trajectory analyzed under the published $0.25/M-write rate; even at full billing, M1 stays under $1/mo extra. |
| Sentry free-tier email-only alerts feel anaemic | medium | Wired only as a backstop; primary alerting is Cloudflare email Notifications + (optional) Tail Worker → Discord/Slack webhook. Telegram bridge surfaced as an open question for MASTER. |
| Performance budget numbers feel pulled from the air | low | Each anchored to personas §5.2 / a retailer fact / an RFC constraint / a verified Cloudflare PoP datum. Rationale column makes the source explicit. |
Tests¶
mkdocs build --strictpasses (catches broken cross-refs, missing files).- All four new files render with no link warnings.
No code, no vitest. Foundation ticket.
Doc updates¶
Per Contributing → what needs updating:
- Reference:
docs/reference/kpis.md(new),docs/reference/performance-budget.md(new),docs/reference/index.md(rows added) - RFC:
docs/rfc/0007-observability-stack.md(new),docs/rfc/index.md(row added) - Plan: this file +
docs/plans/index.md(row added) - Architecture:
architecture/deployment.md§ Observability is currently a stub — leave to the implementation ticket that follows RFC-0007 acceptance, not this foundation work - Glossary: no new terms emerged; defer
- Issue body: not updating from this branch — closes via PR
Rollback¶
Pure docs. git revert <sha> and re-run mkdocs build --strict. No infra to unwind.