RFC-0003: Search backend for product browse and build picker¶
- Status: Accepted — locked by ADR-0008 on 2026-04-28
- Author: MASTER (drafted by Claude as part of #34)
- Date: 2026-04-28
- Related: #39, ADR-0008, Architecture → Overview, Reference → Personas, Reference → Competitive landscape, tech-stack reference
Decision recorded. Postgres FTS + pg_trgm + unaccent + curated synonym map, behind src/lib/search.ts. See ADR-0008 for the locked decision; this RFC remains as the comparative analysis that produced it.
Summary¶
961tech browse and the build picker need search — typo-tolerant, faceted, and able to handle Arabic + English queries with common transliterations ("كيس" / "case", "معالج" / "CPU"). Recommendation: Postgres FTS + pg_trgm + unaccent + a hand-curated synonym/transliteration map, with the implementation hidden behind a src/lib/search.ts interface so the migration to Meilisearch (the most likely upgrade path) is bounded. The migration trigger is one of: catalog crosses ~25k listings, instant-search-as-you-type becomes a real product requirement, or Arabic-query CTR data shows the trigram fallback is failing real users.
Motivation¶
Today, browse is a category-filtered list rendered server-side with Prisma findMany (see Architecture → Overview § Routes). There is no search. #39 is the search ticket and is currently unscoped.
Search has to be designed before it's built because retrofitting is expensive: schema changes (extra tsvector columns or index sync pipelines), query patterns, the layer that owns query parsing, and — most importantly — the synonym/transliteration map are all decisions that a "just bolt search on later" approach gets wrong. The Arabic dimension makes that especially true.
The Lebanese market context comes from Reference → Personas and Reference → Competitive landscape: users land via Google with Arabic SERP queries, brand-name typos are common ("ryzen" / "رايزن"), and the AIB-vs-reference GPU naming problem ("ASUS ROG STRIX RTX 4070" → "RTX 4070") needs to be addressed at the matching layer, not the search layer.
Proposal¶
Three options were researched (Postgres FTS + pg_trgm + unaccent; self-hosted Meilisearch; Typesense). The proposal is option 1, with a portability harness that makes option 2 cheap to migrate to later.
Recommendation: Postgres FTS + pg_trgm + unaccent + synonym map¶
Schema additions.
- Add a generated
tsvectorcolumn onProductcoveringbrand,model, weightedsetweight()for title vs body, with the dictionary configured assimple(so we don't lose Arabic — thearabicsnowball stemmer is too lossy) plus a normalization step described below. - Add a generated
tsvectorcolumn onListingcoveringtitleRaw. - GIN index on each
tsvector. - GIN trigram index (
pg_trgm) onProduct.brand || ' ' || Product.modelandListing.titleRawfor typo / partial-match fallback. unaccentextension enabled (handles Latin diacritics, plus we'll wrap it for Arabic — see below).
Arabic + English normalization.
Postgres' built-in arabic stemmer is too coarse and unaccent is Latin-only. Both languages get a normalization pass in app code before insert / before query:
- Strip Arabic tashkeel (diacritics).
- Normalize alef forms: ا/أ/إ/آ → ا.
- Normalize ya: ي/ى → ي.
- Normalize ta marbuta: ة → ه.
- Lowercase Latin.
- Strip Latin diacritics via
unaccent()at query time.
This lives in src/lib/search.ts so it's symmetric (same normalization for indexed text and query text).
Synonym / transliteration map.
A plain JSON file at src/lib/search-synonyms.json mapping:
{
"كيس": ["case", "كيسه"],
"معالج": ["cpu", "processor"],
"كرت شاشة": ["gpu", "graphics card"],
"ذاكرة": ["ram", "memory"],
"مزود": ["psu", "power supply"]
}
Query expansion happens at query time — for each token, look up synonyms, OR them into the tsquery. The map is the actual product — it's portable across all three backend choices and lives in the repo.
Faceting.
Standard GROUP BY ... FILTER (WHERE ...) queries against the same indexes. Categories, brands, retailers, in-stock, price-range — all cheap at our scale.
Autocomplete.
pg_trgm GIN + ILIKE 'prefix%', single-digit ms at our scale.
Public interface — the portability harness¶
Every search call in src/ goes through one module:
// src/lib/search.ts
export interface SearchHits {
hits: Array<{ kind: 'product' | 'listing'; id: string; score: number; highlight?: string }>;
facets: Record<string, Array<{ value: string; count: number }>>;
total: number;
tookMs: number;
}
export async function search(
query: string,
filters: { category?: Category; brand?: string; retailerId?: string; inStock?: boolean; priceUsdMin?: number; priceUsdMax?: number },
facets: Array<'category' | 'brand' | 'retailer' | 'inStock'>,
): Promise<SearchHits> { ... }
export async function suggest(prefix: string, limit?: number): Promise<Array<{ value: string; kind: 'product' | 'listing' }>> { ... }
Implementation detail (Postgres queries, normalization, synonym expansion) lives behind this interface. The migration to Meilisearch later is "swap the implementation, keep the signature."
What this proposal does NOT do¶
- Does not introduce ML reranking, vector search, or LLM-assisted query understanding. All three are deferrable to specific future RFCs once we have query data.
- Does not solve AIB-vs-reference GPU naming — that's a matching problem (
src/lib/matching.ts), not a search problem.
Trade-offs¶
| Cost | What it buys |
|---|---|
| Arabic morphology is shallow. Postgres has no proper Arabic root-form unification ("كاتب / كتاب / مكتوب" don't unify). Same caveat applies (less severely) to Meili and Typesense. | The synonym map covers the practical pain points (transliterations, common variants); deep morphology isn't a real M1 user need. |
| Trigram fallback latency starts to feel laggy around 10k rows. | At our M1 scale (~2k listings, ~5–20k within a year), trigram is fine. |
| Sync is implicit (no separate pipeline). Indexes update inside the same Postgres transaction as the scrape upsert. | Zero "out-of-sync" failure mode. No backup story for a separate index. No second container. |
Faceting via GROUP BY ... FILTER gets verbose at high cardinality. |
Cheap at our cardinality. Worth a lib/search.ts helper to normalize. |
| No instant-as-you-type at scale. If we want sub-100ms typo-tolerant autocomplete on 50k+ items, Postgres becomes a fight. | At M1 we're not promising that UX. The migration trigger handles this case. |
Synonym map is hand-curated. Someone has to maintain src/lib/search-synonyms.json. |
The map is the actual value — it lives in the repo, it's reviewable, and it's portable to any backend choice. |
Alternatives¶
Meilisearch (self-hosted Docker)¶
Purpose-built fuzzy search engine. Best-in-class typo tolerance (configurable typo budget per word length, prefix search, sub-50ms p95). Arabic supported via the charabia tokenizer (Arabic segmentation, alef/ya normalization, tashkeel stripping) since v1.3+ — better than raw Postgres but still no deep morphology. Faceting first-class. Self-hosted: one container, ~200MB–1GB RAM. Meili Cloud entry tier ~$30/mo.
- Where it wins: above ~10–20k documents, the moment instant-search-as-you-type becomes a product requirement, or when Arabic handling becomes a user complaint.
- Where it loses for us today: extra container (and its own backup story, sync pipeline, failure mode) for a problem we don't have yet. Sync is eventual — not transactional with Postgres.
- Migration cost from FTS: bounded if we hide search behind
lib/search.ts. Mostly a re-implementation of the module's body + a sync handler in the scrape pipeline.
Typesense¶
Comparable to Meilisearch — typo tolerance, Arabic tokenizer (Unicode segmentation, normalization, similar caveats), faceting, curation rules (pinned/hidden results per query — Meili needs more work for this), vector search. Self-hosted ~300MB–1GB RAM; Typesense Cloud entry tier $19/mo.
- Where it wins over Meili (marginally): richer faceting (range facets out-of-box), curation rules, slightly more mature multi-node story.
- Where it loses: same as Meili for our workload — extra infra + sync pipeline. Marginal feature differences don't pay for the operational delta over Postgres FTS today.
Open questions¶
- Is Arabic-query SERP a hard requirement at M1? Reference → Personas and Reference → Competitive landscape suggest yes — but is it required to launch or required by M2? If launch-required, the synonym map needs investment up front; if M2-required, ship with English-first FTS and accept thin Arabic coverage at first.
- Who owns the synonym map? Recommend MASTER. The map is small (10s to low-100s of entries) and reflects real user vocabulary that an automated extractor won't get right. It can be expanded over time from search-log analysis.
- Migration trigger thresholds. Above ~25k listings, or when a specific UX requirement (instant typo-tolerant autocomplete) is committed to. Worth writing the trigger into the future ADR so we don't drift past it without noticing.
- How is search latency observed? Decided once observability lands (RFC-0001). For now, log
tookMsinlib/search.tsand surface in the same place we surface other slow-query telemetry. - Do we want unified search across products + listings, or two separate indexes / endpoints? The interface above models them as a single result set; this is fine for product-card-style results, harder for power-user queries that want listing-level prices. Default to unified; revisit if M2 introduces an "advanced search" UI.
Implementation plan¶
Once MASTER picks Postgres FTS, the implementation work for #39 is roughly:
- Lock decision as ADR
0006(or whichever number is next) - Migration: enable
pg_trgm+unaccentextensions; addtsvectorcolumns + GIN indexes onProductandListing - Create
src/lib/search.tswith the interface above; implementsearch()andsuggest()against Postgres - Create
src/lib/search-synonyms.jsonwith the initial transliteration / synonym map (MASTER-curated) - Wire normalisation into the scrape pipeline so
tsvectorcolumns stay current on upsert - Surface search in the UI: (a) global search box in
TopNav, (b) the/build/choose/[category]picker page, ©/products?q=...query - Tests: golden-path queries (English, Arabic, transliterated), facet correctness, prefix autocomplete latency
- Architecture page update: add a
docs/architecture/search.mdpage narrating the indexing + query path
Out of scope¶
- Vector / embedding search. Future RFC. Worth waiting for query data to know whether we even need it.
- LLM-assisted query understanding (e.g., "best AM5 build under $2000" → filter set). Future RFC.
- Search-result personalisation / ranking models. Way out of scope at M1 traffic.
- Search log / analytics pipeline. Pulled along by the observability decision. For now, server logs of
tookMsand theq=query string are the input to "is this working" intuition. - Spell correction / "did you mean" suggestions. Trigram fallback gives us approximate-match scoring already; explicit "did you mean" UI is a future polish.
- Indexing build guides / blog posts. When #19 eventually adds editorial guides (Architecture → Overview § Routes), they get their own search story — out of scope here.