ADR-0008: Search backend is Postgres FTS + pg_trgm + unaccent + curated synonym/transliteration map¶
- Status: Accepted
- Date: 2026-04-28
- Deciders: MASTER
- Related: #39, RFC-0003, #34, ADR-0004 (i18n English-only UI), ADR-0006 (hosting), Personas, Competitive landscape
Context¶
RFC-0003 compared three search backends for product browse + the build picker: Postgres FTS (pg_trgm + unaccent), self-hosted Meilisearch, and Typesense.
961tech needs typo-tolerant, faceted search that handles Arabic + English queries despite the UI being English-only (ADR-0004). Lebanese SERP traffic arrives partly in Arabic ("كرت شاشة" / "GPU"), and brand-name typos / transliterations are common. Catalog scale: ~2k listings at M1, ~5–20k expected within a year.
Decision¶
Use Postgres full-text search with pg_trgm (trigram fuzzy fallback), unaccent (Latin diacritics), and a hand-curated JSON synonym/transliteration map, all hidden behind a src/lib/search.ts interface so a future Meilisearch migration is bounded.
Schema additions (when #39 implements)¶
- Generated
tsvectorcolumn onProductcoveringbrand,model, with weightedsetweight()for title vs body. Dictionary configured assimple(thearabicsnowball stemmer is too lossy). - Generated
tsvectorcolumn onListingcoveringtitleRaw. - GIN index on each
tsvector. - GIN trigram index (
pg_trgm) onProduct.brand || ' ' || Product.modelandListing.titleRawfor typo / partial-match fallback. unaccentextension enabled.
Arabic + English normalization (in app code, applied symmetrically at index time and query time)¶
- Strip Arabic tashkeel (diacritics)
- Normalize alef forms: ا/أ/إ/آ → ا
- Normalize ya: ي/ى → ي
- Normalize ta marbuta: ة → ه
- Lowercase Latin
- Strip Latin diacritics via
unaccent()at query time
Lives in src/lib/search.ts.
Synonym / transliteration map¶
A plain JSON file at src/lib/search-synonyms.json mapping Arabic ↔ English: كيس ↔ case, معالج ↔ cpu, كرت شاشة ↔ gpu, ذاكرة ↔ ram, مزود ↔ psu, etc. Query-time expansion ORs synonyms into the tsquery. The map is curated, lives in the repo, is portable to any future backend.
Public interface (src/lib/search.ts)¶
Exposed signatures: search(query, filters, facets) → SearchHits, suggest(prefix, limit) → string[]. All call sites use this; the implementation is swappable.
Consequences¶
Positive¶
- Sync is implicit. Indexes update inside the same Postgres transaction as the scrape upsert. No separate pipeline; no out-of-sync failure mode; no second backup story.
- No new infrastructure. Same managed Postgres (ADR-0006). One less container, one less vendor-cost tier.
- Synonym map is the actual product — portable, reviewable, expandable from search-log analysis (when #43 observability lands).
- Faceting via
GROUP BY ... FILTERis cheap at our cardinality. - Right-sized for M1/M2. ~2k listings now → ~5–20k within a year. Postgres FTS comfortably handles that.
Negative¶
- Arabic morphology is shallow. No root-form unification (كاتب / كتاب / مكتوب don't unify). Same caveat applies to Meili and Typesense, less severely.
- Trigram fallback latency starts to feel laggy ~10k rows. We're not there yet.
- No instant-search-as-you-type at scale. If we want sub-100ms typo-tolerant autocomplete on 50k+ items, Postgres becomes a fight. Migration trigger documented below.
- Faceting at high cardinality gets verbose. Wrap in a
lib/search.tshelper. - Synonym map is hand-curated. Someone maintains
src/lib/search-synonyms.json. Curated by MASTER; expanded over time from search-log analysis.
Neutral¶
- The portability harness costs nothing at runtime. The interface is just TypeScript types + a single import path. Migration to Meili later is "swap the implementation, keep the signature."
Migration triggers — pre-recorded¶
Switch to Meilisearch (likely) when any fires:
- Catalog crosses ~25k listings.
- Instant-search-as-you-type becomes a real product requirement.
- Arabic-query CTR data shows trigram fallback failing real users (i.e. queries with no English-equivalent in the synonym map are losing buyers — observed via #43 telemetry).
Documenting in advance so we don't drift past the threshold without noticing.
Alternatives considered¶
Meilisearch (self-hosted Docker) — rejected for now¶
Best-in-class fuzzy search; Arabic via charabia tokenizer (better than raw Postgres). Loses today on: extra container + sync pipeline + backup story. Sync becomes eventual rather than transactional with Postgres.
Typesense — rejected for now¶
Comparable to Meilisearch. Marginally better for facet ranges + curation rules. Loses for the same reasons as Meili: extra infra delta vs Postgres FTS. The marginal feature differences don't pay for the operational cost at our M1 scale.
Decisions on RFC-0003 open questions¶
- Arabic-query SERP at launch: ship light — bootstrap the synonym map with ~30 high-value entries (top Arabic equivalents for each component category) for M1; expand over time from search-log analysis once #43 lands. Don't gate launch on a comprehensive map.
- Synonym map ownership: MASTER curates. Initial bootstrap can be AI-assisted with MASTER review; ongoing expansion driven by real search logs.
- Migration trigger thresholds: ~25k listings + instant-search-UX requirement OR Arabic CTR failure (see above). Recorded.
- Search latency observability: decided when ADR-0006 / #43 observability lands. For now, log
tookMsinlib/search.tsand surface in slow-query telemetry. - Unified vs separate (products + listings): unified result set in M1 (single API, simpler UX). Revisit if M2 introduces an "advanced search" UI.
References¶
- RFC-0003 — Search backend
- ADR-0004 — i18n English-only — UI is English; this ADR handles Arabic search input
- ADR-0006 — Hosting — Postgres on Neon
- Personas — Code-switching context
- Competitive landscape — Arabic SERP §4.6