ADR-0008: Search backend is Postgres FTS + pg_trgm + unaccent + curated synonym/transliteration map¶

Status: Accepted
Date: 2026-04-28
Deciders: MASTER
Related: #39, RFC-0003, #34, ADR-0004 (i18n English-only UI), ADR-0006 (hosting), Personas, Competitive landscape

Context¶

RFC-0003 compared three search backends for product browse + the build picker: Postgres FTS (pg_trgm + unaccent), self-hosted Meilisearch, and Typesense.

961tech needs typo-tolerant, faceted search that handles Arabic + English queries despite the UI being English-only (ADR-0004). Lebanese SERP traffic arrives partly in Arabic ("كرت شاشة" / "GPU"), and brand-name typos / transliterations are common. Catalog scale: ~2k listings at M1, ~5–20k expected within a year.

Decision¶

Use Postgres full-text search with pg_trgm (trigram fuzzy fallback), unaccent (Latin diacritics), and a hand-curated JSON synonym/transliteration map, all hidden behind a src/lib/search.ts interface so a future Meilisearch migration is bounded.

Schema additions (when #39 implements)¶

Generated tsvector column on Product covering brand, model, with weighted setweight() for title vs body. Dictionary configured as simple (the arabic snowball stemmer is too lossy).
Generated tsvector column on Listing covering titleRaw.
GIN index on each tsvector.
GIN trigram index (pg_trgm) on Product.brand || ' ' || Product.model and Listing.titleRaw for typo / partial-match fallback.
unaccent extension enabled.

Arabic + English normalization (in app code, applied symmetrically at index time and query time)¶

Strip Arabic tashkeel (diacritics)
Normalize alef forms: ا/أ/إ/آ → ا
Normalize ya: ي/ى → ي
Normalize ta marbuta: ة → ه
Lowercase Latin
Strip Latin diacritics via unaccent() at query time

Lives in src/lib/search.ts.

Synonym / transliteration map¶

A plain JSON file at src/lib/search-synonyms.json mapping Arabic ↔ English: كيس ↔ case, معالج ↔ cpu, كرت شاشة ↔ gpu, ذاكرة ↔ ram, مزود ↔ psu, etc. Query-time expansion ORs synonyms into the tsquery. The map is curated, lives in the repo, is portable to any future backend.

Public interface (`src/lib/search.ts`)¶

Exposed signatures: search(query, filters, facets) → SearchHits, suggest(prefix, limit) → string[]. All call sites use this; the implementation is swappable.

Consequences¶

Positive¶

Sync is implicit. Indexes update inside the same Postgres transaction as the scrape upsert. No separate pipeline; no out-of-sync failure mode; no second backup story.
No new infrastructure. Same managed Postgres (ADR-0006). One less container, one less vendor-cost tier.
Synonym map is the actual product — portable, reviewable, expandable from search-log analysis (when #43 observability lands).
Faceting via GROUP BY ... FILTER is cheap at our cardinality.
Right-sized for M1/M2. ~2k listings now → ~5–20k within a year. Postgres FTS comfortably handles that.

Negative¶

Arabic morphology is shallow. No root-form unification (كاتب / كتاب / مكتوب don't unify). Same caveat applies to Meili and Typesense, less severely.
Trigram fallback latency starts to feel laggy ~10k rows. We're not there yet.
No instant-search-as-you-type at scale. If we want sub-100ms typo-tolerant autocomplete on 50k+ items, Postgres becomes a fight. Migration trigger documented below.
Faceting at high cardinality gets verbose. Wrap in a lib/search.ts helper.
Synonym map is hand-curated. Someone maintains src/lib/search-synonyms.json. Curated by MASTER; expanded over time from search-log analysis.

Neutral¶

The portability harness costs nothing at runtime. The interface is just TypeScript types + a single import path. Migration to Meili later is "swap the implementation, keep the signature."

Migration triggers — pre-recorded¶

Switch to Meilisearch (likely) when any fires:

Catalog crosses ~25k listings.
Instant-search-as-you-type becomes a real product requirement.
Arabic-query CTR data shows trigram fallback failing real users (i.e. queries with no English-equivalent in the synonym map are losing buyers — observed via #43 telemetry).

Documenting in advance so we don't drift past the threshold without noticing.

Alternatives considered¶

Meilisearch (self-hosted Docker) — rejected for now¶

Best-in-class fuzzy search; Arabic via charabia tokenizer (better than raw Postgres). Loses today on: extra container + sync pipeline + backup story. Sync becomes eventual rather than transactional with Postgres.

Typesense — rejected for now¶

Comparable to Meilisearch. Marginally better for facet ranges + curation rules. Loses for the same reasons as Meili: extra infra delta vs Postgres FTS. The marginal feature differences don't pay for the operational cost at our M1 scale.

Decisions on RFC-0003 open questions¶

Arabic-query SERP at launch: ship light — bootstrap the synonym map with ~30 high-value entries (top Arabic equivalents for each component category) for M1; expand over time from search-log analysis once #43 lands. Don't gate launch on a comprehensive map.
Synonym map ownership: MASTER curates. Initial bootstrap can be AI-assisted with MASTER review; ongoing expansion driven by real search logs.
Migration trigger thresholds: ~25k listings + instant-search-UX requirement OR Arabic CTR failure (see above). Recorded.
Search latency observability: decided when ADR-0006 / #43 observability lands. For now, log tookMs in lib/search.ts and surface in slow-query telemetry.
Unified vs separate (products + listings): unified result set in M1 (single API, simpler UX). Revisit if M2 introduces an "advanced search" UI.

References¶

RFC-0003 — Search backend
ADR-0004 — i18n English-only — UI is English; this ADR handles Arabic search input
ADR-0006 — Hosting — Postgres on Neon
Personas — Code-switching context
Competitive landscape — Arabic SERP §4.6