AI discoverability — what makes 961tech an LLM-citable surface¶

Reference for the surfaces 961tech ships so AI assistants cite us when a Lebanese user asks ChatGPT / Claude / Perplexity / Google AI Overviews / Apple Intelligence "where is the cheapest RTX 4070 in Beirut" or "what laptops are available under $800 in Lebanon." Produced for Foundation: AI discoverability (#47); pairs with RFC-0009 which carries the actual decisions.

1. Scope & method¶

What this is. A per-surface reference covering six surfaces — robots.txt, llms.txt, schema.org / JSON-LD, OpenGraph + Twitter Card, page-content shape, machine-readable feeds — with M1 / M2 / deferred status per recommendation. Grounded in current AI-assistant behavior verified against vendor documentation, not aspirational SEO folklore.

What this isn't. Not a Google-search SEO strategy — that's #38. Not a security or anti-scraping policy — that's #44. Not a KPI definition — that's #43. Not implementation — no app/robots.ts, no app/sitemap.ts, no JSON-LD components in this work; that's a follow-up code ticket.

Method. Verified facts against:

Official vendor docs for AI crawlers — OpenAI (platform.openai.com/docs/bots), Anthropic (support.claude.com), Perplexity (docs.perplexity.ai/guides/bots), Google (developers.google.com/search/docs/crawling-indexing/google-common-crawlers), Apple (support.apple.com/en-us/119829), Meta (developers.facebook.com/docs/sharing/bot/), DuckDuckGo (duckduckgo.com/duckduckgo-help-pages/results/duckassistbot).
The llms.txt spec source at llmstxt.org backed by AnswerDotAI/llms-txt.
Schema.org V30.0 (2026-03-19) — every type page at https://schema.org/<Type> directly.
Google Search Central structured-data docs at developers.google.com/search/docs/appearance/structured-data/.
Live + Wayback fetches of comparable aggregators' robots.txt (PCPartPicker, Geizhals, Idealo, Skroutz, Pricena, EGPrices, EG-PC, BuildMyPC, Logical Increments, Newegg, LDLC, PCPrices) on 2026-04-28.

Confidence taxonomy. Same three buckets as personas.md §1.3:

Mark	Meaning
Vendor-stated	Direct quote or paraphrase from the vendor's own documentation
Hypothesis	Reasoned from vendor docs + observed behavior; defensible but not a vendor commitment
Untested	Speculation included for completeness; flagged

Inline tags follow the persona-doc convention (silence = vendor-stated; (hypothesis) / (untested) otherwise).

Honest limits.

No major AI assistant publishes a complete grounding pipeline doc. OpenAI, Anthropic, Perplexity, Google, and Apple all document their crawlers; none documents exactly which signals (HTML text vs JSON-LD vs OpenGraph vs llms.txt) feed citations. Recommendations on content shape are Hypothesis-grade based on observed behavior + the structural fact that all crawlers ingest HTML as text.
"Honors robots.txt" is a vendor claim. Multiple 2024–2025 third-party audits found Perplexity and others fetching via undeclared user agents. We follow vendor-stated behavior for policy; we don't pretend it's enforced.
The genre has no consensus posture. Cross-aggregator survey (§5) shows everything from blanket-block (PCPartPicker, BuildMyPC) to fully-open (Newegg, LDLC, Pricena). There is no "industry baseline" we'd be deviating from.
llms.txt has zero documented LLM consumers as of April 2026. Producers exist (Anthropic, Stripe, Vercel, Supabase, Cloudflare); no major AI vendor has stated they read it from third-party sites.

2. Per-surface recommendations¶

2.1 `robots.txt` — AI crawler posture¶

The single highest-leverage surface. Three classes of crawler need separate decisions:

Class	What it does	Effect of blocking	Examples
A. Training crawlers	Bulk-crawl the open web; data feeds future model weights	Opts you out of future training. Already-trained models unchanged. No effect on whether you're cited today.	`GPTBot`, `Google-Extended`, `Applebot-Extended`, `meta-externalagent` (training portion), `ClaudeBot`
B. AI-search index crawlers	Build a fresh index used to surface citations when a user runs an AI search query	Blocking removes you from AI-assisted search results. This is the one that kills citations.	`OAI-SearchBot`, `Claude-SearchBot`, `PerplexityBot`, `meta-webindexer`
C. On-demand assistant fetchers	Fetch a specific URL right now because a user asked the assistant a question	Blocking prevents the assistant from reading your page in response to a direct user question. Many bypass `robots.txt` anyway because the request is user-initiated.	`ChatGPT-User`, `Claude-User`, `Perplexity-User`, `meta-externalfetcher`, `DuckAssistBot`

Per-bot reference (vendor-stated)¶

Vendor	UA	Class	Honors robots.txt?	Notes
OpenAI	`GPTBot`	Training	Yes
OpenAI	`ChatGPT-User`	On-demand	"Rules may not apply" — user-initiated
OpenAI	`OAI-SearchBot`	AI-search index	Yes
OpenAI	`OAI-AdsBot`	Ads landing-page	Yes	Only fires if you submit ads
Anthropic	`ClaudeBot`	Training	Yes	Earlier framing was broader; current doc treats as training
Anthropic	`Claude-User`	On-demand	Yes (Anthropic states all bots respect)
Anthropic	`Claude-SearchBot`	AI-search index	Yes
Anthropic	`Claude-Web` / `anthropic-ai`	Legacy	Unclear — not in current doc	List defensively; harmless
Perplexity	`PerplexityBot`	AI-search index	Yes	Perplexity is a major shopping/comparison citation source
Perplexity	`Perplexity-User`	On-demand	"Generally ignores" — user-initiated
Google	`Googlebot`	Search index (also feeds AI Overviews)	Yes	AI Overviews has no separate UA
Google	`Google-Extended`	Training opt-out (Gemini)	Yes	Does NOT affect Search ranking
Google	`GoogleOther`	R&D one-offs	Yes
Google	`Google-CloudVertexBot`	Vertex AI agent build	Yes	Only fires if site owner builds an agent
Apple	`Applebot`	Search index (Spotlight, Siri, Safari Suggestions)	Yes	Data may feed Apple foundation models unless `Applebot-Extended` is disallowed
Apple	`Applebot-Extended`	Training opt-out	Yes	Does NOT crawl itself; governs reuse of `Applebot` data
Meta	`meta-externalagent`	Training + product indexing (bundled)	Yes	Blocking costs Meta AI indexing too
Meta	`meta-webindexer`	AI-search index	Yes	"Helps us cite and link to your content in Meta AI's responses"
Meta	`meta-externalfetcher`	On-demand / agentic	"May bypass"
Meta	`facebookexternalhit`	Link previews	"Might bypass" for security	Drives WhatsApp/FB/IG share-card grounding
DuckDuckGo	`DuckAssistBot`	On-demand	Yes (~72h propagation)	"Explicitly NOT used to train AI models"
Microsoft	`Bingbot`	Search index (also feeds Copilot)	Yes	No separate Copilot training UA documented

961tech recommendation¶

961tech's strategic position: small Lebanese aggregator with near-zero brand recognition in MENA AI surfaces, competing for citation traffic against effectively-no-one (Pricena explicitly skips Lebanon per competitive-landscape.md §4.1). Citation traffic is the entire monetisation funnel for the AI-search era — every user who lands on us via "ChatGPT recommended 961tech" is a click-through to retailer affiliate.

Posture: Allow everything in M1. Block nothing. Preserve every citation pathway. If scraping abuse emerges later (which only happens once we're worth scraping — see #44), narrow then.

This is the opposite of PCPartPicker/BuildMyPC's blanket Cloudflare-managed block (5M+ MAU sites that can afford to assert rights), and lighter than Skroutz's tiered model (allow assistants on HTML pages, deny on parametric search). 961tech is too small for Skroutz's nuance to matter yet; the simpler "allow everything, hide the click-out redirector" pattern is the genre's plurality (5/12 peers do nothing about AI bots — Newegg, LDLC, Pricena, EG-PC, EGPrices) and lets us focus on being citable, not being protected.

Surface	M1	M2	Deferred
Allow all training UAs (`GPTBot`, `ClaudeBot`, `Google-Extended`, `Applebot-Extended`, `meta-externalagent`)	Ship	—	—
Allow all AI-search index UAs (`OAI-SearchBot`, `Claude-SearchBot`, `PerplexityBot`, `meta-webindexer`, `Applebot`)	Ship	—	—
Allow all on-demand UAs (`ChatGPT-User`, `Claude-User`, `Perplexity-User`, `DuckAssistBot`, `meta-externalfetcher`, `facebookexternalhit`)	Ship	—	—
`Disallow: /api/go/` (the click-out redirector — universal pattern across peers)	Ship	—	—
`Sitemap:` directive pointing at `/sitemap.xml`	Ship	—	—
`Crawl-delay: 60` for known-aggressive UAs	—	M2 if observed	—
Cloudflare "Block AI training" managed rule	—	—	Defer until specific abuse

Implementation note. robots.txt itself is one file at the site root. In Next.js 16, the canonical implementation is src/app/robots.ts exporting a MetadataRoute.Robots (verify against current Next.js 16 docs in node_modules/next/dist/docs/ per AGENTS.md before writing).

2.2 `llms.txt` — curated markdown index for LLMs¶

What it is. A markdown file at /llms.txt with a curated index of the site's most-citable URLs. Proposed by Jeremy Howard (Answer.AI), 2024-09-03. Spec at llmstxt.org. Informal — not on any RFC/IETF/W3C track but stable since proposal. The Markdown structure is strict: H1 (project name, required), optional blockquote summary, optional non-heading prose, zero-or-more H2 sections each containing a markdown list of - [name](url): notes, plus a special ## Optional section for low-priority links.

What it isn't. Not a sitemap (no exhaustive URL list — curated, ~5KB). Not robots.txt (no access policy). Not for training (Howard explicitly notes inference-time grounding, not training).

Adoption (verified live 2026-04-28):

Site	`/llms.txt`	Notes
docs.anthropic.com	200 (134 KB)	Massive Claude/Anthropic docs index
docs.stripe.com	200 (93 KB)	Thorough docs index
vercel.com	200 (355 KB)	Full docs tree
nextjs.org	200 (7.7 KB)	Curated; also documents `.md`-suffix convention
supabase.com	200 (1.2 KB)	Textbook curated TOC + `/llms-full.txt` companion
cloudflare.com	200	Marketing-oriented
hono.dev	200	+ `/llms-full.txt` and `/llms-small.txt` variants
docs.perplexity.ai	200
docs.cursor.com	200
openai.com / platform.openai.com	404	OpenAI ships none
ai.google.dev	404	Google ships none
Aggregators
pcpartpicker.com	403 (bot block)	None
geizhals.de	403	None
skroutz.gr	403	None
idealo.de	503 (Cloudflare)	None
pricena.com	404	None
egprices.com	403	None
eg-pc.com	404	None
pcprices.vercel.app	SPA fallback	None
buildmypc.net	404	None
newegg.com	404	None

Genre adoption: zero. No PC-parts aggregator or general-comparison aggregator ships one as of April 2026. Strong in dev-tools/docs sites; absent in commerce/comparison.

Honest assessment. Hypothesis-grade on impact: no major AI vendor has published a doc stating they read /llms.txt from third-party sites. Anthropic publishes one for its own docs but doesn't say Claude consumes it externally. Documented consumers today are coding-assistant scaffolds (Cursor, Continue, Cline, Aider) that probe the file when a user names a library. Cost to ship is low (one curated markdown file, no JS, no schema); upside is asymmetric — we're a first-mover in the price-aggregator vertical, and any agentic assistant doing "compare GPU prices in Lebanon" via tool-use that does probe /llms.txt lands on a clean structured index of our value proposition.

961tech recommendation: ship one in M1. The cost-to-skip is identical (zero ongoing maintenance for a 5KB file); the cost-to-ship is one PR. Curated index of: build flow, all-parts catalog, retailer audit, compatibility-rules reference, project repo. Skip product/listing pages (defeats curation purpose — those belong in sitemap.xml).

Suggested initial content (final wording in the implementation ticket):

# 961tech

> Lebanon-specific PC parts price comparison and compatibility-checked
> builder. Aggregates real-time prices from Lebanese retailers, normalises
> listings, and lets users build a PC with automatic compatibility checks.

961tech is a solo project covering the Lebanese PC-parts market — a market
no global aggregator (PCPartPicker, Geizhals, Idealo, Pricena) covers.
Prices in USD; Lebanon-only retailers; Lebanon-only delivery realities.
Source code is public.

## Browse
- [All parts](https://961tech.pages.dev/parts): Faceted catalog across CPU, GPU, motherboard, RAM, storage, PSU, case, cooler.
- [Build a PC](https://961tech.pages.dev/build): All-slots-at-once builder UI with live compatibility checks.
- [Retailer coverage](https://961tech.pages.dev/about/retailers): Per-retailer reference for the Lebanese tech-retail surface we index.

## Reference
- [Architecture overview](https://961tech.pages.dev/architecture/overview): How the system is built.
- [Compatibility rules](https://961tech.pages.dev/about/compatibility-rules): What we check and don't.
- [Glossary](https://961tech.pages.dev/glossary): Domain terms (Call For Price, Fresh USD, etc.).
- [Principles](https://961tech.pages.dev/principles): Engineering values that shape decisions.

## Project
- [GitHub repo](https://github.com/Amine32/961tech)
- [Public roadmap](https://github.com/users/Amine32/projects/2)

## Optional
- [ADRs](https://961tech.pages.dev/adr/): Locked architectural decisions.
- [RFCs](https://961tech.pages.dev/rfc/): Proposals under review.

Surface	M1	M2	Deferred
`/llms.txt` curated index, ≤5KB	Ship	—	—
`.md` shadow URLs (`/foo` ↔ `/foo.md` returning rendered MDX as `text/markdown`) for docs pages	—	M2 candidate	—
`/llms-full.txt` (concatenated full docs)	—	—	Defer until docs are stable
`/llms-ctx.txt` / `/llms-ctx-full.txt` (XML-wrapped for `llms_txt2ctx` CLI)	—	—	Skip unless we see a documented consumer

2.3 Schema.org / JSON-LD¶

Verified against schema.org V30.0 (2026-03-19) and Google Search Central docs.

Per-page-type coverage¶

Page type	Type(s)	M1	M2	Deferred
Product detail	`Product` + `AggregateOffer` (containing `Offer[]`) + `Brand` + `BreadcrumbList`	Ship	—	—
Category listing	`BreadcrumbList` + (optional) `ItemList` of `Product` references	M1 (breadcrumb); ItemList M2	—	—
Retailer profile	`LocalBusiness` (subtype of `Organization`) + `PostalAddress`	—	Ship	—
Build detail (saved/shared)	`BreadcrumbList` only	M1 (breadcrumb)	—	—
Homepage	`WebSite` + `Organization` (+ `SearchAction` if global search ships)	—	Ship	—
`Review` / `AggregateRating` anywhere	—	—	—	Defer permanently until 961tech has first-party reviews

`Product` properties (schema.org/Product)¶

Property	Status	Notes
`name`	Required
`image`	Required for merchant listing eligibility	≥1 URL
`offers`	Required	Use the `AggregateOffer` + nested `Offer[]` pattern below
`description`	Recommended	Plain text. Useful for AI grounding
`brand`	Recommended	`{"@type": "Brand", "name": "..."}`
`sku`	Recommended	961tech internal ID
`gtin` / `gtin8/12/13/14`	Recommended	Pass through if retailer publishes EAN/UPC
`mpn`	Recommended (alt to `gtin`)	Manufacturer Part Number — useful for PC parts where GTIN is patchy
`category`	Recommended	E.g. `"Computers > Components > GPU"`
`additionalProperty`	Recommended for compat specs	Array of `PropertyValue` for `socket`, `tdp`, `vramGB`, `coreCount`, etc.

`Offer` vs `AggregateOffer` — the aggregator decision¶

Schema.org explicitly endorses AggregateOffer for the multi-retailer case (https://schema.org/AggregateOffer): "When a single product is associated with multiple offers (for example, the same pair of shoes is offered by different merchants)."

Google's separate guidance: merchant-listing rich-result eligibility requires Offer, not AggregateOffer — "the merchant has to be the seller of the product" (Google's merchant-listing docs). 961tech is an aggregator, not a merchant, so the merchant-listing rich result is unreachable regardless. We remain eligible for the lighter Product snippet rich result.

Recommended pattern: emit BOTH shapes. Product.offers is an AggregateOffer with lowPrice/highPrice/offerCount/priceCurrency AND a nested offers: Offer[] array of individual retailer offers. This satisfies schema.org, satisfies Google Product-snippet, and gives AI assistants a clean structure they can quote ("$229–$275 across 3 Lebanese retailers").

{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "ASUS ROG STRIX RTX 4070 Super",
  "image": ["https://961tech.pages.dev/img/asus-rog-strix-rtx4070s.jpg"],
  "brand": { "@type": "Brand", "name": "ASUS" },
  "sku": "961-prd-rtx4070s-asus-strix",
  "mpn": "ROG-STRIX-RTX4070S-O12G-GAMING",
  "category": "Computers > Components > GPU",
  "additionalProperty": [
    { "@type": "PropertyValue", "name": "vramGB", "value": 12 },
    { "@type": "PropertyValue", "name": "tdpWatts", "value": 220 }
  ],
  "offers": {
    "@type": "AggregateOffer",
    "priceCurrency": "USD",
    "lowPrice": "799.00",
    "highPrice": "865.00",
    "offerCount": 3,
    "offers": [
      {
        "@type": "Offer",
        "url": "https://pcandparts.com/...",
        "price": "799.00",
        "priceCurrency": "USD",
        "availability": "https://schema.org/InStock",
        "itemCondition": "https://schema.org/NewCondition",
        "seller": { "@type": "Organization", "name": "PCAndParts" },
        "priceValidUntil": "2026-05-05"
      }
    ]
  }
}

`Offer.availability` — the "Call For Price" question¶

Canonical ItemAvailability enum (https://schema.org/ItemAvailability): BackOrder, Discontinued, InStock, InStoreOnly, LimitedAvailability, MadeToOrder, OnlineOnly, OutOfStock, PreOrder, PreSale, Reserved, SoldOut. There is no QuoteOnRequest.

961Souq has ~78% of CPU listings as "Call For Price" (retailers.md §2.2) — this is structural, not edge-case. Decision goes to RFC-0009; preferred path: emit Offer with availability: https://schema.org/MadeToOrder, omit price and priceCurrency, keep url and seller. Honest, valid schema.org, lets AI assistants surface "available, contact retailer for pricing." Disqualifies the listing from price-bearing rich results (correct — there's no price to show), keeps it in the JSON-LD payload for grounding.

`Review` and `AggregateRating` — do not ship¶

Google's review-snippet policy (https://developers.google.com/search/docs/appearance/structured-data/review-snippet) explicitly forbids self-serving aggregation:

"Don't aggregate reviews or ratings from other websites."
"Don't rely on human editors to create, curate, or compile ratings information for local businesses."

961tech does not have first-party reviews in M1/M2. Faking AggregateRating (e.g. defaulting to 5 stars or aggregating retailer reviews) gets a manual action and is forbidden. Decision: omit Review and AggregateRating markup entirely until 961tech itself collects reviews (post-M3, gated on a real review submission flow).

`LocalBusiness` for retailer profiles (M2)¶

For retailer profile pages (/r/[slug] per #10 retailer profile pages):

{
  "@context": "https://schema.org",
  "@type": "LocalBusiness",
  "name": "PCAndParts",
  "url": "https://pcandparts.com",
  "address": {
    "@type": "PostalAddress",
    "streetAddress": "...",
    "addressLocality": "Beirut",
    "addressRegion": "Beirut",
    "addressCountry": "LB"
  },
  "telephone": "+96101...",
  "logo": "https://961tech.pages.dev/r/pcandparts/logo.png",
  "sameAs": [
    "https://www.facebook.com/pcandparts",
    "https://www.instagram.com/pcandparts"
  ],
  "priceRange": "$$",
  "openingHoursSpecification": [
    { "@type": "OpeningHoursSpecification", "dayOfWeek": ["Monday","Tuesday","Wednesday","Thursday","Friday"], "opens": "09:00", "closes": "19:00" }
  ]
}

Important nuance. This is 961tech describing a third-party retailer. Google may not award rich results on our domain (canonical authority belongs to the retailer's own site). Treat as AI-assistant grounding payload, not a Google ranking play.

`BreadcrumbList`¶

Standard format (https://schema.org/BreadcrumbList):

{
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [
    { "@type": "ListItem", "position": 1, "name": "Components", "item": "https://961tech.pages.dev/parts" },
    { "@type": "ListItem", "position": 2, "name": "GPUs", "item": "https://961tech.pages.dev/parts/gpu" },
    { "@type": "ListItem", "position": 3, "name": "RTX 4070 Super" }
  ]
}

Last item omits item; position is 1-indexed.

Format + placement¶

JSON-LD only. Google's stated preference. Microdata/RDFa accepted but inferior.
Server-rendered. AI crawlers (GPTBot, ClaudeBot, PerplexityBot) often skip JS. Render JSON-LD inside the server component (Next.js 16 app/.../page.tsx).
Placement: <script type="application/ld+json"> in <head> or end of <body> — both accepted by Google.

Validation tooling: - Google Rich Results Test — Google-specific eligibility, will warn on AggregateOffer-only. - Schema Markup Validator — generic schema.org validation.

2.4 OpenGraph + Twitter Card¶

Universal across all pages — cheap, helps social previews, helps AI scrapers ground titles/descriptions/images consistently.

Minimum set for product detail:

<meta property="og:title" content="ASUS ROG STRIX RTX 4070 Super — Lebanese price comparison | 961tech" />
<meta property="og:description" content="Compare ASUS ROG STRIX RTX 4070 Super prices across 3 Lebanese retailers. $799–$865 USD. Real-time stock, last updated 2 hours ago." />
<meta property="og:type" content="product" />
<meta property="og:image" content="https://961tech.pages.dev/img/.../share.png" />
<meta property="og:url" content="https://961tech.pages.dev/p/..." />
<meta property="og:site_name" content="961tech" />
<meta property="og:locale" content="en_US" />
<meta name="twitter:card" content="summary_large_image" />

(og:locale = en_US for now per ADR-0004 English-only. When i18n revisits, this becomes per-page.)

Page types covered: - Product detail — full set including og:type: product and product-specific fields if Facebook Product Catalog is ever wired up - Build detail (saved/shared) — full set with og:type: article, image is the build's hero render once that ships (#9) - Retailer profile — full set with og:type: profile or business.business - Homepage + category landing — full set with og:type: website

Surface	M1	M2	Deferred
OG title/description/url/image/site_name on every public page	Ship	—	—
Twitter Card `summary_large_image`	Ship	—	—
`og:type: product` on product detail	Ship	—	—
Per-build social-share image render (#9)	—	M2	—
Facebook Product Catalog feed	—	—	Defer until paid ads

2.5 Page-content shape — citability beyond markup¶

This is where 961tech earns citation vs. just qualifying for it. Schema.org tells crawlers "this page is about X"; the first 500 tokens of prose tell the LLM why it should quote this page over the competing ten. Grounded in current AI-assistant behavior:

Crawlers ingest HTML as text. JSON-LD is text inside a <script> tag from their perspective; it helps because property names are self-explanatory, but the headline/lead/list structure of the visible page matters more for which sentence gets cited.
AI assistants prefer short, factual, time-stamped, attributable claims. (Hypothesis — observed pattern in citation outputs across Perplexity / Claude / Google AI Overview; no vendor doc states this explicitly.)
Lebanese-specific framing matters because the competing pages are global (PCPartPicker US prices, Geizhals EU prices, Pricena MENA-but-not-Lebanon); a query like "RTX 4070 in Beirut" gets unambiguously matched by a page that says "RTX 4070 in Lebanon ranges from $799 to $865 across 3 Beirut retailers as of 2026-04-28".

Patterns to ship¶

First-paragraph-as-citation on every product detail page. First sentence answers what is this (canonical product name + brand + category). Second sentence answers what does it cost in Lebanon (price range + retailer count + currency). Third sentence answers where to buy it (retailer names, "in stock" affordances). All within ~500 visible tokens before any UI chrome. The product detail page redesign in #28 inherits this constraint.
"As of <date>" stat block on the homepage. Visible prose, not just JSON-LD. Examples of the kind of cite-worthy assertion AI overview boxes quote:
- "961tech tracks 1,759 SKUs across 3 Lebanese retailers as of 2026-04-28."
- "Lebanese-market PC parts are predominantly USD-priced; RTX 4070 in Lebanon ranges from $799 to $865 across N listings on 2026-04-28."
- "Macrotronics is the only major Lebanese retailer that displays prices VAT-inclusive; PCAndParts and 961Souq display VAT-exclusive."

Specific, verifiable, time-stamped, Lebanese-specific. These exist because they're the kind of factual claim an AI assistant grounds against — and no other page on the open web makes them. (See RFC-0009 decision 4 for whether the homepage actually surfaces this in M1.)

Last updated <Nh ago> per listing row — visible, not buried in tooltip. Lets a user (and a citation engine) verify freshness without round-tripping. Stale-data warning when timestamp is >24h. Aligns with competitive-landscape.md §3.6 freshness pattern (Geizhals + PCPartPicker baseline) and counters Pricena's known "outdated price data" weakness.
Lebanese-specific framing in prose — every category / retailer / build page mentions Lebanon explicitly in the first paragraph. Not "best PC parts comparison" but "Lebanon-specific PC parts price comparison covering Beirut retailers including PCAndParts, 961Souq, and Macrotronics." This is the phrase Perplexity/Claude/ChatGPT match against user queries containing "lebanon" / "beirut".
Retailer attribution per price — "Source: PCAndParts, updated 2h ago" inline with each listing row, not in a tooltip. Per competitive-landscape.md §3.6.

Surface	M1	M2	Deferred
First-paragraph-as-citation on product detail	Ship (constraint into #28)	—	—
`Last updated <Nh ago>` per listing row	Ship	—	—
Lebanese-specific framing in prose on every category/landing	Ship	—	—
Retailer attribution per price (visible)	Ship	—	—
Homepage "as of <date>" stat block	(RFC-0009 decision)	—	—
Per-product 1-paragraph editorial intro generated from canonical specs	—	M2 candidate	—
Per-build summary prose ("This $1,200 1080p build covers...")	—	M2 candidate (links to #9, #13)	—

2.6 Machine-readable feeds¶

`sitemap.xml` — universal¶

Genre baseline: every working aggregator surveyed has one. Newegg has 8 (including a ProductListKeywords_USA.xml for search terms). PCPrices' SPA-fallback /sitemap.xml is an unforced error.

Recommendation: ship one in M1. Single /sitemap.xml (Next.js 16 app/sitemap.ts) covering homepage, all category pages, all retailer profile pages once they exist, all product detail pages, all docs pages. Reference it from robots.txt.

RSS / JSON Feed / public REST API¶

Genre survey: none of 12 peers expose RSS or any public machine-readable feed. Every /feed, /rss, /feed.xml, /rss.xml probed returned 404 / 403 / SPA HTML.

961tech recommendation: - No public API in M1/M2. Inviting machine-readable scraping of the entire catalog before we have a monetisation hedge is asymmetric — competitors (if any emerge) can pull our data wholesale. Scraping retailer feeds is our differentiation; we don't hand the same to a hypothetical follower. - RSS for price drops — natural fit once #14 price drop alerts ships. A /feed/price-drops.rss carrying the last 50 price drops is cheap, useful for power users (and bots), and doesn't expose the whole catalog. Defer to M2 alongside #14. - Per-product structured-data shadow URLs (/p/[slug].md returning rendered product page as text/markdown) — interesting for llms.txt consumers but defer until we see a documented consumer.

Surface	M1	M2	Deferred
`/sitemap.xml`	Ship	—	—
`Sitemap:` reference in `robots.txt`	Ship	—	—
`/feed/price-drops.rss`	—	M2 (with #14)	—
`/p/[slug].md` shadow URLs	—	—	Defer
Public REST API (`/api/v1/products`, etc.)	—	—	Defer to M3+ once monetisation is settled
Facebook Product Catalog feed	—	—	Defer to paid ads era

3. What AI assistants actually do — grounded in vendor docs¶

This section is all Hypothesis-grade unless explicitly tagged otherwise. No major AI assistant publishes a complete grounding pipeline. Recommendations on content shape follow from observed citation behavior + the structural fact that all crawlers ingest HTML as text.

Assistant	Crawler stack	What is documented	What's Hypothesis
ChatGPT (browse)	`ChatGPT-User` for live fetch; `OAI-SearchBot` for index; `GPTBot` for training	Vendor-stated: respects `robots.txt` for `GPTBot` + `OAI-SearchBot`. `ChatGPT-User` "rules may not apply" because user-initiated.	Citation source = the live-fetched page text including JSON-LD as text. No public statement that JSON-LD is parsed structurally.
Claude (web tool)	`Claude-User` for live fetch; `Claude-SearchBot` for index; `ClaudeBot` for training	Vendor-stated: all bots respect `robots.txt`.	Same — text ingestion; structured-data parsing not documented.
Perplexity	`PerplexityBot` for index; `Perplexity-User` for live fetch	Vendor-stated: index respects `robots.txt`; user-initiated "generally ignores." Explicitly NOT used for training.	Major shopping/comparison citation surface. (Hypothesis — observed citation pattern; no vendor commitment.)
Google AI Overviews	`Googlebot` only — no separate UA	Vendor-stated: AI Overviews layer on top of standard Search index. `Google-Extended` controls Gemini training but NOT AI Overviews.	Schema.org rich-result eligibility transitively benefits AI Overview eligibility. (Hypothesis — Google doesn't publish AI Overview ranking signals.)
Apple Intelligence	`Applebot` for index; `Applebot-Extended` for training opt-out	Vendor-stated: data may feed Apple foundation models unless `Applebot-Extended` is disallowed.	Lebanese iPhone share is meaningful; Siri/Spotlight grounding flows through `Applebot`. (Hypothesis — adoption data not public.)
Meta AI / WhatsApp AI	`meta-webindexer` for AI search; `meta-externalfetcher` for on-demand; `facebookexternalhit` for link previews	Vendor-stated: "allowing Meta-WebIndexer helps us cite and link to your content in Meta AI's responses."	WhatsApp link previews via `facebookexternalhit` are a Lebanese-specific channel — Lebanese commerce is WhatsApp-heavy (`personas.md` §5.5). (Hypothesis on volume.)
DuckDuckGo (DuckAssist)	`DuckAssistBot`	Vendor-stated: respects `robots.txt`, ~72h propagation, NOT used for training.	Smaller share; relevant for privacy-conscious cohort.

The honest summary. Every AI assistant's crawler stack is documented. Every assistant's grounding behavior (what makes a page get cited) is not. We optimize for the textually-obvious things — crawlers can read the page, the page is fast, the first paragraph is factual and time-stamped, JSON-LD is present and well-formed, and the URL is stable — and let the rest follow. Recommendations in §2 are calibrated to this honest uncertainty.

4. M1 / M2 / deferred summary¶

Single source-of-truth table. Every recommendation in §2 surfaced here.

M1 (this milestone — implementation ticket pending)¶

robots.txt allowing all AI UAs (training, AI-search index, on-demand) + disallowing /api/go/ + Sitemap: directive
/sitemap.xml covering homepage, category pages, product detail pages, docs pages
/llms.txt curated index (≤5KB)
Schema.org JSON-LD on product detail: Product + AggregateOffer (with nested Offer[]) + Brand + BreadcrumbList + additionalProperty for compat-relevant specs
Offer.availability mapping including MadeToOrder for "Call For Price" listings
Offer.priceValidUntil set to next-scrape-window-end
BreadcrumbList JSON-LD on category + product + build detail pages
OpenGraph + Twitter Card on every public page
First-paragraph-as-citation prose pattern on product detail (constraint into #28)
Last updated <Nh ago> visible per listing row
Lebanese-specific framing in prose on every category/landing
Retailer attribution per price (visible)

M2¶

LocalBusiness JSON-LD on retailer profile pages (depends on #10)
Homepage WebSite + Organization + SearchAction JSON-LD
ItemList JSON-LD on category listing pages
Per-build social-share image render (depends on #9)
/feed/price-drops.rss (depends on #14)
.md shadow URLs for docs pages (candidate, not committed)
Per-product 1-paragraph editorial intro from canonical specs (candidate)
Per-build summary prose (candidate, depends on #9, #13)

Deferred¶

Review / AggregateRating markup — until 961tech has first-party reviews (M3+, gated on review submission flow)
/llms-full.txt — until docs are stable
Public REST API — until monetisation is settled (M3+)
Facebook Product Catalog feed — paid ads era
Cloudflare "Block AI training" managed rule — until specific scraping abuse
Per-product /p/[slug].md shadow URLs — until we observe a documented consumer
Schema.org Dataset type for homepage stats — never (wrong type per Google docs)

5. Comparable aggregators' posture (April 2026 snapshot)¶

Cross-reference table for competitive-landscape.md §3.6. Verbatim from each site's live robots.txt (or Wayback for CF-walled sites). Fetched 2026-04-28.

Site	AI training UAs	AI-search/assistant UAs	`llms.txt`	Sitemap	RSS
PCPartPicker	All major training UAs blocked (CF-managed: `GPTBot`, `ClaudeBot`, `CCBot`, `Google-Extended`, `Bytespider`, `Amazonbot`, `Applebot-Extended`, `meta-externalagent`) + `Content-Signal: search=yes,ai-train=no`	None blocked	404	yes (403 direct)	none
Geizhals	Only `meta-externalagent` blocked	None blocked	403	yes	none
Idealo	`Applebot-Extended` blocked with surgical path-allowlist (`/unternehmen`, `/legal/`, `/magazin`); `omgilibot` same	None of GPTBot/ClaudeBot in fetched range	503 (CF)	not checked	not checked
Skroutz	All major training UAs blocked (`ClaudeBot`, `anthropic-ai`, `CCBot`, `Bytespider`, `Amazonbot`, `PetalBot`)	Tiered — `OAI-SearchBot`, `GPTBot`, `ChatGPT-User`, `Google-Extended`, `PerplexityBot` get `Allow:/$` + HTML-only allowlist (`/c/.html$`, `/s//*.html$`); parameterized URLs disallowed	403	yes	none
Pricena	None	None	404	yes (points at HTML page, not XML)	none
EG-PC	None	None	404	yes	none
EGPrices	None	None	403	yes	none
PCPrices	(SPA fallback — no real `robots.txt`)	(SPA fallback)	(SPA fallback)	(SPA fallback)	none
BuildMyPC	All major training UAs blocked (CF-managed, byte-identical to PCPartPicker)	None	404	yes	none
Logical Increments	(No real `robots.txt`)	—	403	unknown	unknown
Newegg	None — zero AI directives	None	404	yes (8 sitemaps)	none
LDLC	None — zero AI directives	None	404	7 sitemaps (one per locale)	none real

Cross-cutting findings (informs RFC-0009 robots.txt decision):

No consensus posture. Plurality (5/12) does nothing about AI bots. 2/12 blanket-block via Cloudflare-managed rule. Only Skroutz hand-tiers.
Skroutz's tiered model is the most thoughtful — block training, allow AI-search/assistant on HTML-only, deny on parametric search. Worth revisiting once 961tech is large enough for nuance to matter.
llms.txt adoption in genre is zero. First-mover opportunity (or tells us the format is genre-irrelevant — both possible).
RSS / public feeds in genre is zero. Sitemaps carry the load; Newegg's 8-sitemap fan-out is the most ambitious.
Universal pattern: hide the click-out redirector. Every aggregator disallows it (/redir/ Geizhals, /preisvergleich/Relocate/ Idealo, /partenaire/ LDLC, /m/...ajax/...storageApi Newegg). 961tech's /api/go/ follows the same logic.

6. Open questions¶

Surfaced for RFC-0009 decision; not resolved here.

robots.txt posture conflict with #41 monetisation. Blocking AI training crawlers (GPTBot, ClaudeBot, Google-Extended, Applebot-Extended) closes a future B2B data-licensing revenue stream. Allowing them gives data away free for training. Today, neither path is monetised; the citation path is what matters. Surfaced as RFC-0009 Open Question.
Schema.org "Call For Price" mapping. MadeToOrder is the closest canonical; LimitedAvailability is a fallback; omitting the Offer is the strict-honest path. ~78% of one retailer's CPU listings are in this state. RFC-0009 decision.
Homepage "as of <date>" stat block — does it ship in M1, or wait for #28 page design? Cross-cuts page design. RFC-0009 decision.
/sitemap.xml — should retailer profile pages be in M1 even though #10 hasn't shipped? Or do we ship sitemap.xml with what exists today (homepage + product pages + docs) and extend?
Cloudflare Bot Management. #44 security inherits this. The relevant question for this doc: do we expect Cloudflare's "Block AI training" managed rule to become part of our posture, and if so, when does that trigger?
English-only constraint vs. Arabic SERP grounding. Per ADR-0004, 961tech ships English-only through M2. AI assistants asked Arabic queries ("قطع كمبيوتر بيروت") may have less to ground from on our pages. Mitigated by the search-input Arabic-tolerance per RFC-0003, but the grounding text is English. Worth probing in M2 telemetry per personas.md §7 Arabic-cohort drift signal.

7. See also¶

RFC-0009 — AI discoverability — the decisions this reference doc informs
competitive-landscape.md §3.6 Trust + transparency patterns — peer transparency posture
competitive-landscape.md §4.1 Pricena's deliberate Lebanon skip — the strategic gap citation traffic exploits
competitive-landscape.md §4.6 Search-engine surface — uncontested SEO (and by extension AI-grounding) territory across EN/FR/AR
personas.md §5.5 Where they shop today — entry-channel mix; AI-citation-driven entry would be additive to FB-group + WhatsApp-group + direct-retailer mix
retailers.md — per-retailer "Call For Price" volume informs the schema.org availability decision
ADR-0004 English-only language scope — constrains og:locale and grounding-text language
ADR-0005 Casual flow parallel to Builder — UC-13 single-product pages are the Casual track's primary citation surface; UC-9 builds are the Builder track's
RFC-0003 Search backend — Arabic-tolerant search input + Lebanese transliteration synonyms; orthogonal to AI-grounding but interacts on Arabic-query coverage
#38 SEO strategy — Google-search-cited surface; structured data overlaps; settle whichever lands second against whichever lands first
#28 page design — inherits the first-paragraph-as-citation constraint and the Last updated per-row pattern
#43 KPIs / observability / perf budget — AI-citation rate as a KPI candidate (referrer = chat.openai.com, claude.ai, perplexity.ai, gemini.google.com, etc.)
#44 security — rate-limit + Cloudflare WAF; enforces what robots.txt declares
#29 DB / ingest pipeline — Listing.lastSeenAt already exists per data-model.md; surface it as the Last updated UI signal and the priceValidUntil JSON-LD value
Issue #47 — the foundation ticket this doc closes alongside RFC-0009

Sources cited (canonical):

OpenAI bots: platform.openai.com/docs/bots
Anthropic bots: support.claude.com/en/articles/8896518
Perplexity bots: docs.perplexity.ai/guides/bots
Google bots: developers.google.com/search/docs/crawling-indexing/google-common-crawlers
Apple bots: support.apple.com/en-us/119829
Meta bots: developers.facebook.com/docs/sharing/bot/
DuckDuckGo bots: duckduckgo.com/duckduckgo-help-pages/results/duckassistbot
llms.txt spec: llmstxt.org — backed by github.com/AnswerDotAI/llms-txt
Schema.org Product: schema.org/Product
Schema.org Offer: schema.org/Offer
Schema.org AggregateOffer: schema.org/AggregateOffer
Schema.org ItemAvailability: schema.org/ItemAvailability
Schema.org LocalBusiness: schema.org/LocalBusiness
Schema.org BreadcrumbList: schema.org/BreadcrumbList
Google merchant listing: developers.google.com/search/docs/appearance/structured-data/merchant-listing
Google review snippet (self-serving policy): developers.google.com/search/docs/appearance/structured-data/review-snippet
Google intro to structured data: developers.google.com/search/docs/appearance/structured-data/intro-structured-data
Google Rich Results Test: search.google.com/test/rich-results
Schema Markup Validator: validator.schema.org