Skip to content

ADR-0013: AI discoverability posture

Context

RFC-0009 asked five decisions about how 961tech becomes citable by ChatGPT / Claude / Perplexity / Google AI Overviews / Apple Intelligence, plus two real open questions: (a) does an open robots.txt foreclose #41 monetisation's future B2B AI-data-licensing path, and (b) what schema.org availability value best represents the ~78% of 961Souq CPU listings that are "Call For Price."

961tech has zero Lebanese aggregator competitors in MENA AI surfaces (competitive-landscape.md §4.1, §4.6). Every Lebanese user asking "where's the cheapest RTX 4070 in Beirut" of an AI assistant should land on a 961tech citation; this ADR is what makes that citation possible.

The work has to land before #28 page design starts because page design inherits constraints from the decisions: first-paragraph-as-citation pattern, Last updated <Nh ago> per listing row, homepage stat block, retailer profile LocalBusiness scaffolding (M2).

Decision

Decision 1 — robots.txt posture: fully open for M1. Allow every AI UA class — training (GPTBot, ClaudeBot, anthropic-ai, Google-Extended, Applebot-Extended, meta-externalagent), AI-search index (OAI-SearchBot, Claude-SearchBot, PerplexityBot, meta-webindexer), on-demand (ChatGPT-User, Claude-User, Perplexity-User, DuckAssistBot, meta-externalfetcher, facebookexternalhit), conventional search (Googlebot, Bingbot, Applebot). Disallow /api/go/ (universal genre pattern for click-out redirector). Reference Sitemap: from the file.

Decision 2 — /llms.txt: ship one in M1. Curated ≤5KB markdown index at /llms.txt: H1 + blockquote summary + three real H2 sections (## Browse, ## Reference, ## Project) + ## Optional. Skip /llms-full.txt and .md shadow URLs at M1; revisit when docs stabilize.

Decision 3 — schema.org coverage scope. - M1 — every product detail page: Product + AggregateOffer (with nested Offer[]) + Brand + BreadcrumbList + additionalProperty for compat-relevant specs (socket, TDP, VRAM, etc.). Per-Offer.availability mapped to canonical ItemAvailability enum. Emit JSON-LD only for products with matchStatus = 'matched' (which already requires > 0.7 confidence per data-model.md Invariant 3); unmatched/weak listings emit no structured data and surface as plain HTML. - M1 — every category page + product detail + build detail: BreadcrumbList. - M2 — every retailer profile page (depends on #10): LocalBusiness + PostalAddress + OpeningHoursSpecification + sameAs social links. - M2 — homepage: WebSite + Organization + SearchAction. - Permanently deferred until 961tech has first-party reviewsReview + AggregateRating. Google policy explicitly forbids aggregating reviews from other sites; faking the count gets a manual action. The decision is not "we don't yet" — it's "Google says we may not." - Skip entirelyDataset markup for the homepage stat block. Wrong type per Google docs (Dataset is for downloadable datasets, not editorial prose summaries).

Decision 4 — Homepage "as of \<date>" stat block: ship M1 as plain prose with <time> tag. Three to five Lebanese-specific time-stamped assertions in the visible top-of-homepage band, refreshed at scrape-window cadence — for example: "961tech tracks 1,759 SKUs across 3 Lebanese retailers as of 2026-04-28. RTX 4070 Super in Lebanon ranges from $799 to $865 across 3 listings. Last refreshed 2 hours ago." Concrete copy lives in #28 page design; the constraint set by this ADR is "specific, verifiable, Lebanese-specific, time-stamped, ≤500 tokens of citable prose at the top of the homepage." Mark up the timestamp with <time datetime="…">. Do not use Dataset schema — Dataset is for downloadable datasets and mis-marking gets the page flagged.

Decision 5 — Machine-readable feeds. - M1/sitemap.xml covering homepage + all category pages + all product detail pages + all docs pages, referenced from robots.txt via Sitemap: directive. - M2/feed/price-drops.rss alongside #14: last 50 price drops with product link, old/new price, retailer, timestamp. - Through M2: no public REST API. No /api/v1/products, no JSON catalog endpoint, no Facebook Product Catalog feed. The matcher output (canonical Product DB) is the moat per competitive-landscape.md §3.1; not handing it out for free.

OQ-(a) — open robots.txt vs future B2B data-licensing. Resolved per RFC-0009 OQ-(a) Option 1 (recommended): the licensable B2B product, when/if it materialises, lives behind a gated API with provenance + real-time guarantees. HTML stays open for citation. The two surfaces don't conflict — vendors do not pay for data they could have scraped, they pay for data they cannot. Any future shift in policy (e.g. Skroutz tiered model) is reversible by editing robots.txt; nothing in this decision forecloses it.

OQ-(b) — schema.org availability for "Call For Price" listings. Resolved per RFC-0009 OQ-(b) Option 1 (recommended): availability: https://schema.org/MadeToOrder, omit price and priceCurrency, keep url + seller. Closest canonical semantics for "produced/quoted on demand." AI assistants surface "available, contact retailer for pricing" cleanly.

Re-evaluation. Decision 1 reopens if (i) #41 monetisation opens an actively-pursued B2B data-licensing path, (ii) Cloudflare-observable scraping abuse emerges and the "Block AI training" managed rule becomes operationally relevant, or (iii) a vendor's behaviour conflates training-class blocking with citation-class blocking in a way that costs us measurable citation traffic. Decisions 2–5 reopen at the quarterly stack review (SB-082 per ADR-0012).

Consequences

Positive

  • D1: open citation pathways across every major AI assistant. Genre plurality (5/12 peers) does this; only 5M+ MAU sites can afford to assert rights against scraping.
  • D2: first-mover in the price-aggregator vertical. Asymmetric — even one agentic scaffold probing /llms.txt for "compare GPU prices in Lebanon" gives us a cleaner card than competitors. Free signaling that the project is technically thoughtful.
  • D3: Google Product-snippet eligibility on every matched product. AI-assistant grounding payload (price range + retailer count + availability + brand). additionalProperty surfaces compat specs (socket, TDP, VRAM) — exactly what makes 961tech different and what no other Lebanese aggregator exposes structurally.
  • D4: homepage carries the most cite-worthy assertions on any Lebanese PC-parts page. AI Overview boxes love specific time-stamped facts.
  • D5: sitemap is genre baseline. RSS for price drops gates a power-user audience without exposing the catalog (50-row sliding window doesn't reveal the matcher output). No public API protects the moat.
  • OQ-(a) resolution: keeps citation-traffic upside while preserving the gated-API path as a separate product when/if monetisation work surfaces it.
  • OQ-(b) resolution: 961Souq's "Call For Price" cohort stays in our schema.org coverage rather than disappearing — citation surface for "is X available in Lebanon" stays intact for the largest catalog we scrape.

Negative

  • D1: free training data for OpenAI / Anthropic / Google / Meta / Apple. We get nothing back if they monetise it. Reversibility is real but not costless — once the data is scraped, blocking later doesn't unscrape it.
  • D2: ongoing maintenance surface. A broken/stale llms.txt is worse than no llms.txt (wrong message). Format is informal (no IETF/W3C track) and could be obsolete in 18 months — sunk ~2 hours.
  • D3: ~1 day of code work plus tests for the JSON-LD composers. LocalBusiness schema for retailer pages encodes a third-party's data; Google may not show rich results on our domain (but AI assistants ground from the markup regardless).
  • D3 (Review/AggregateRating absence): we cannot ship review snippets on Google search results until 961tech has first-party reviews. Could be a year+. The cost is real visibility loss against retailers that do aggregate (and risk Google enforcement).
  • D4: constrains #28 homepage layout — reserves ~80px mobile / ~120px desktop top-of-page for the stats. Live data computation cost on every homepage render (count queries on Product + Retailer + max(Listing.lastSeenAt)) — small but non-zero, will need short-window caching.
  • D5: disappoints power users + agentic tools that want JSON of the catalog. They have to scrape HTML / JSON-LD instead. Asymmetric protection costs us their goodwill.

Neutral

  • The implementation work lands in a single follow-up code ticket (feat: AI-discoverability surface (M1)) that is not part of this ADR. The ADR locks the contract; the ticket implements it.
  • This ADR does not address: classic Google-search SEO (#38, interacts with D3), AI-citation telemetry pipeline (#43 KPIs), per-page editorial copy (#28 page design + #42 brand voice), or i18n SEO (deferred per ADR-0004 through M2).

Alternatives considered

D1 — block all AI training, allow AI-search + on-demand (Skroutz tiered model)

Rejected for M1. Vendor docs distinguish training from citation, but vendor behavior sometimes conflates them — blocking GPTBot may also reduce ChatGPT search citations even though OAI-SearchBot is allowed. At our scale, the citation upside outweighs the licensing-rights principle. The OQ-(a) resolution preserves this path as a future-revisitable option.

D1 — strict allow-list (only the named search/citation bots)

Rejected. Cuts off meta-webindexer (Meta AI / WhatsApp citations are Lebanese-relevant per personas.md §5.5) and creates ambiguity for legacy Anthropic UAs (Claude-Web, anthropic-ai).

D1 — Cloudflare-managed "Block AI bots" rule

Rejected for M1 — same reasoning as Skroutz tiered model, plus the toggle being at the edge makes reversal slower.

D1 — paid-only access (require API key)

Rejected for M1 — kills citation entirely. Possible M3+ Stage-4 partner offering (#41) for retailer integration, not for AI assistants.

D2 — skip llms.txt (zero genre adoption)

Rejected. Genre adoption being zero is the first-mover argument, not the skip argument. Cost to ship is one PR; cost to skip is the same; only one direction has any upside.

D2 — ship full /llms-full.txt with concatenated docs

Rejected for M1 — high maintenance burden, only worth it once docs stabilize. M2 candidate.

D2 — .md shadow URLs for every page

Rejected for M1 — doable in Next.js 16 but real engineering work. M2 candidate; defer until we observe a documented consumer fetching them.

D3 — Review / AggregateRating with default values or aggregated retailer reviews

Rejected. Google policy explicitly forbids both. Manual action risk.

D3 — confidence-only (only matchConfidence > 0.85)

Acceptable refinement; chose the lower bar of matchStatus = 'matched' (already requires > 0.7 per data-model Invariant 3) so we don't double-define thresholds. If matcher noise causes false-positive JSON-LD, raise the bar.

D3 — AggregateOffer only, no nested Offer[]

Rejected per ai-discoverability.md §2.3. Both shapes maximize compatibility with Google + AI-assistant grounding.

D3 — individual Offer[] only, no AggregateOffer wrapper

Rejected. Schema.org explicitly endorses AggregateOffer for the multi-merchant case; we lose lowPrice/highPrice/offerCount summary fields that AI assistants quote directly.

D4 — defer the stat block to #28 page design

Rejected. Defining the constraint in M1 means #28 inherits it as a content requirement. Deferring it means we ship a homepage and re-litigate the constraint.

D4 — Dataset schema markup

Rejected. Wrong type per Google docs.

D4 — WebPage + mainEntity of QuantitativeValue

Rejected as over-engineering. AI assistants ground from visible prose more reliably than from speculative property paths.

D5 — public REST API in M1

Rejected. Asymmetric scraping risk on the matcher output (the moat).

D5 — Facebook Product Catalog feed

Rejected. Only useful with paid ads, which is M3+.

D5 — JSON Feed instead of RSS

Acceptable refinement; both serve the same purpose. RSS has wider tooling support.

OQ-(a) — block training UAs, allow AI-search + on-demand

Rejected. See D1 alternatives — vendor-behavior risk on conflating training with citation.

OQ-(a) — block everything not on allow-list

Rejected. Maximally protective; minimal citation. Wrong for our scale.

OQ-(b) — availability: LimitedAvailability, omit price

Rejected. Weaker semantic match for "Call For Price" (LimitedAvailability implies stock is bounded; the listing's actual state is "quoted on demand").

OQ-(b) — omit the Offer entirely

Rejected. Worst for citation surface. An AI assistant asking "is X available in Lebanon" gets no signal for the ~78% of 961Souq CPU listings; we lose the largest-catalog cohort.

References