AI discoverability — what makes 961tech an LLM-citable surface¶
Reference for the surfaces 961tech ships so AI assistants cite us when a Lebanese user asks ChatGPT / Claude / Perplexity / Google AI Overviews / Apple Intelligence "where is the cheapest RTX 4070 in Beirut" or "what laptops are available under $800 in Lebanon." Produced for Foundation: AI discoverability (#47); pairs with RFC-0009 which carries the actual decisions.
1. Scope & method¶
What this is. A per-surface reference covering six surfaces — robots.txt, llms.txt, schema.org / JSON-LD, OpenGraph + Twitter Card, page-content shape, machine-readable feeds — with M1 / M2 / deferred status per recommendation. Grounded in current AI-assistant behavior verified against vendor documentation, not aspirational SEO folklore.
What this isn't. Not a Google-search SEO strategy — that's #38. Not a security or anti-scraping policy — that's #44. Not a KPI definition — that's #43. Not implementation — no app/robots.ts, no app/sitemap.ts, no JSON-LD components in this work; that's a follow-up code ticket.
Method. Verified facts against:
- Official vendor docs for AI crawlers — OpenAI (platform.openai.com/docs/bots), Anthropic (support.claude.com), Perplexity (docs.perplexity.ai/guides/bots), Google (developers.google.com/search/docs/crawling-indexing/google-common-crawlers), Apple (support.apple.com/en-us/119829), Meta (developers.facebook.com/docs/sharing/bot/), DuckDuckGo (duckduckgo.com/duckduckgo-help-pages/results/duckassistbot).
- The
llms.txtspec source at llmstxt.org backed byAnswerDotAI/llms-txt. - Schema.org V30.0 (2026-03-19) — every type page at
https://schema.org/<Type>directly. - Google Search Central structured-data docs at developers.google.com/search/docs/appearance/structured-data/.
- Live + Wayback fetches of comparable aggregators'
robots.txt(PCPartPicker, Geizhals, Idealo, Skroutz, Pricena, EGPrices, EG-PC, BuildMyPC, Logical Increments, Newegg, LDLC, PCPrices) on 2026-04-28.
Confidence taxonomy. Same three buckets as personas.md §1.3:
| Mark | Meaning |
|---|---|
| Vendor-stated | Direct quote or paraphrase from the vendor's own documentation |
| Hypothesis | Reasoned from vendor docs + observed behavior; defensible but not a vendor commitment |
| Untested | Speculation included for completeness; flagged |
Inline tags follow the persona-doc convention (silence = vendor-stated; (hypothesis) / (untested) otherwise).
Honest limits.
- No major AI assistant publishes a complete grounding pipeline doc. OpenAI, Anthropic, Perplexity, Google, and Apple all document their crawlers; none documents exactly which signals (HTML text vs JSON-LD vs OpenGraph vs
llms.txt) feed citations. Recommendations on content shape are Hypothesis-grade based on observed behavior + the structural fact that all crawlers ingest HTML as text. - "Honors robots.txt" is a vendor claim. Multiple 2024–2025 third-party audits found Perplexity and others fetching via undeclared user agents. We follow vendor-stated behavior for policy; we don't pretend it's enforced.
- The genre has no consensus posture. Cross-aggregator survey (§5) shows everything from blanket-block (PCPartPicker, BuildMyPC) to fully-open (Newegg, LDLC, Pricena). There is no "industry baseline" we'd be deviating from.
llms.txthas zero documented LLM consumers as of April 2026. Producers exist (Anthropic, Stripe, Vercel, Supabase, Cloudflare); no major AI vendor has stated they read it from third-party sites.
2. Per-surface recommendations¶
2.1 robots.txt — AI crawler posture¶
The single highest-leverage surface. Three classes of crawler need separate decisions:
| Class | What it does | Effect of blocking | Examples |
|---|---|---|---|
| A. Training crawlers | Bulk-crawl the open web; data feeds future model weights | Opts you out of future training. Already-trained models unchanged. No effect on whether you're cited today. | GPTBot, Google-Extended, Applebot-Extended, meta-externalagent (training portion), ClaudeBot |
| B. AI-search index crawlers | Build a fresh index used to surface citations when a user runs an AI search query | Blocking removes you from AI-assisted search results. This is the one that kills citations. | OAI-SearchBot, Claude-SearchBot, PerplexityBot, meta-webindexer |
| C. On-demand assistant fetchers | Fetch a specific URL right now because a user asked the assistant a question | Blocking prevents the assistant from reading your page in response to a direct user question. Many bypass robots.txt anyway because the request is user-initiated. |
ChatGPT-User, Claude-User, Perplexity-User, meta-externalfetcher, DuckAssistBot |
Per-bot reference (vendor-stated)¶
| Vendor | UA | Class | Honors robots.txt? | Notes |
|---|---|---|---|---|
| OpenAI | GPTBot |
Training | Yes | |
| OpenAI | ChatGPT-User |
On-demand | "Rules may not apply" — user-initiated | |
| OpenAI | OAI-SearchBot |
AI-search index | Yes | |
| OpenAI | OAI-AdsBot |
Ads landing-page | Yes | Only fires if you submit ads |
| Anthropic | ClaudeBot |
Training | Yes | Earlier framing was broader; current doc treats as training |
| Anthropic | Claude-User |
On-demand | Yes (Anthropic states all bots respect) | |
| Anthropic | Claude-SearchBot |
AI-search index | Yes | |
| Anthropic | Claude-Web / anthropic-ai |
Legacy | Unclear — not in current doc | List defensively; harmless |
| Perplexity | PerplexityBot |
AI-search index | Yes | Perplexity is a major shopping/comparison citation source |
| Perplexity | Perplexity-User |
On-demand | "Generally ignores" — user-initiated | |
Googlebot |
Search index (also feeds AI Overviews) | Yes | AI Overviews has no separate UA | |
Google-Extended |
Training opt-out (Gemini) | Yes | Does NOT affect Search ranking | |
GoogleOther |
R&D one-offs | Yes | ||
Google-CloudVertexBot |
Vertex AI agent build | Yes | Only fires if site owner builds an agent | |
| Apple | Applebot |
Search index (Spotlight, Siri, Safari Suggestions) | Yes | Data may feed Apple foundation models unless Applebot-Extended is disallowed |
| Apple | Applebot-Extended |
Training opt-out | Yes | Does NOT crawl itself; governs reuse of Applebot data |
| Meta | meta-externalagent |
Training + product indexing (bundled) | Yes | Blocking costs Meta AI indexing too |
| Meta | meta-webindexer |
AI-search index | Yes | "Helps us cite and link to your content in Meta AI's responses" |
| Meta | meta-externalfetcher |
On-demand / agentic | "May bypass" | |
| Meta | facebookexternalhit |
Link previews | "Might bypass" for security | Drives WhatsApp/FB/IG share-card grounding |
| DuckDuckGo | DuckAssistBot |
On-demand | Yes (~72h propagation) | "Explicitly NOT used to train AI models" |
| Microsoft | Bingbot |
Search index (also feeds Copilot) | Yes | No separate Copilot training UA documented |
961tech recommendation¶
961tech's strategic position: small Lebanese aggregator with near-zero brand recognition in MENA AI surfaces, competing for citation traffic against effectively-no-one (Pricena explicitly skips Lebanon per competitive-landscape.md §4.1). Citation traffic is the entire monetisation funnel for the AI-search era — every user who lands on us via "ChatGPT recommended 961tech" is a click-through to retailer affiliate.
Posture: Allow everything in M1. Block nothing. Preserve every citation pathway. If scraping abuse emerges later (which only happens once we're worth scraping — see #44), narrow then.
This is the opposite of PCPartPicker/BuildMyPC's blanket Cloudflare-managed block (5M+ MAU sites that can afford to assert rights), and lighter than Skroutz's tiered model (allow assistants on HTML pages, deny on parametric search). 961tech is too small for Skroutz's nuance to matter yet; the simpler "allow everything, hide the click-out redirector" pattern is the genre's plurality (5/12 peers do nothing about AI bots — Newegg, LDLC, Pricena, EG-PC, EGPrices) and lets us focus on being citable, not being protected.
| Surface | M1 | M2 | Deferred |
|---|---|---|---|
Allow all training UAs (GPTBot, ClaudeBot, Google-Extended, Applebot-Extended, meta-externalagent) |
Ship | — | — |
Allow all AI-search index UAs (OAI-SearchBot, Claude-SearchBot, PerplexityBot, meta-webindexer, Applebot) |
Ship | — | — |
Allow all on-demand UAs (ChatGPT-User, Claude-User, Perplexity-User, DuckAssistBot, meta-externalfetcher, facebookexternalhit) |
Ship | — | — |
Disallow: /api/go/ (the click-out redirector — universal pattern across peers) |
Ship | — | — |
Sitemap: directive pointing at /sitemap.xml |
Ship | — | — |
Crawl-delay: 60 for known-aggressive UAs |
— | M2 if observed | — |
| Cloudflare "Block AI training" managed rule | — | — | Defer until specific abuse |
Implementation note. robots.txt itself is one file at the site root. In Next.js 16, the canonical implementation is src/app/robots.ts exporting a MetadataRoute.Robots (verify against current Next.js 16 docs in node_modules/next/dist/docs/ per AGENTS.md before writing).
2.2 llms.txt — curated markdown index for LLMs¶
What it is. A markdown file at /llms.txt with a curated index of the site's most-citable URLs. Proposed by Jeremy Howard (Answer.AI), 2024-09-03. Spec at llmstxt.org. Informal — not on any RFC/IETF/W3C track but stable since proposal. The Markdown structure is strict: H1 (project name, required), optional blockquote summary, optional non-heading prose, zero-or-more H2 sections each containing a markdown list of - [name](url): notes, plus a special ## Optional section for low-priority links.
What it isn't. Not a sitemap (no exhaustive URL list — curated, ~5KB). Not robots.txt (no access policy). Not for training (Howard explicitly notes inference-time grounding, not training).
Adoption (verified live 2026-04-28):
| Site | /llms.txt |
Notes |
|---|---|---|
| docs.anthropic.com | 200 (134 KB) | Massive Claude/Anthropic docs index |
| docs.stripe.com | 200 (93 KB) | Thorough docs index |
| vercel.com | 200 (355 KB) | Full docs tree |
| nextjs.org | 200 (7.7 KB) | Curated; also documents .md-suffix convention |
| supabase.com | 200 (1.2 KB) | Textbook curated TOC + /llms-full.txt companion |
| cloudflare.com | 200 | Marketing-oriented |
| hono.dev | 200 | + /llms-full.txt and /llms-small.txt variants |
| docs.perplexity.ai | 200 | |
| docs.cursor.com | 200 | |
| openai.com / platform.openai.com | 404 | OpenAI ships none |
| ai.google.dev | 404 | Google ships none |
| Aggregators | ||
| pcpartpicker.com | 403 (bot block) | None |
| geizhals.de | 403 | None |
| skroutz.gr | 403 | None |
| idealo.de | 503 (Cloudflare) | None |
| pricena.com | 404 | None |
| egprices.com | 403 | None |
| eg-pc.com | 404 | None |
| pcprices.vercel.app | SPA fallback | None |
| buildmypc.net | 404 | None |
| newegg.com | 404 | None |
Genre adoption: zero. No PC-parts aggregator or general-comparison aggregator ships one as of April 2026. Strong in dev-tools/docs sites; absent in commerce/comparison.
Honest assessment. Hypothesis-grade on impact: no major AI vendor has published a doc stating they read /llms.txt from third-party sites. Anthropic publishes one for its own docs but doesn't say Claude consumes it externally. Documented consumers today are coding-assistant scaffolds (Cursor, Continue, Cline, Aider) that probe the file when a user names a library. Cost to ship is low (one curated markdown file, no JS, no schema); upside is asymmetric — we're a first-mover in the price-aggregator vertical, and any agentic assistant doing "compare GPU prices in Lebanon" via tool-use that does probe /llms.txt lands on a clean structured index of our value proposition.
961tech recommendation: ship one in M1. The cost-to-skip is identical (zero ongoing maintenance for a 5KB file); the cost-to-ship is one PR. Curated index of: build flow, all-parts catalog, retailer audit, compatibility-rules reference, project repo. Skip product/listing pages (defeats curation purpose — those belong in sitemap.xml).
Suggested initial content (final wording in the implementation ticket):
# 961tech
> Lebanon-specific PC parts price comparison and compatibility-checked
> builder. Aggregates real-time prices from Lebanese retailers, normalises
> listings, and lets users build a PC with automatic compatibility checks.
961tech is a solo project covering the Lebanese PC-parts market — a market
no global aggregator (PCPartPicker, Geizhals, Idealo, Pricena) covers.
Prices in USD; Lebanon-only retailers; Lebanon-only delivery realities.
Source code is public.
## Browse
- [All parts](https://961tech.pages.dev/parts): Faceted catalog across CPU, GPU, motherboard, RAM, storage, PSU, case, cooler.
- [Build a PC](https://961tech.pages.dev/build): All-slots-at-once builder UI with live compatibility checks.
- [Retailer coverage](https://961tech.pages.dev/about/retailers): Per-retailer reference for the Lebanese tech-retail surface we index.
## Reference
- [Architecture overview](https://961tech.pages.dev/architecture/overview): How the system is built.
- [Compatibility rules](https://961tech.pages.dev/about/compatibility-rules): What we check and don't.
- [Glossary](https://961tech.pages.dev/glossary): Domain terms (Call For Price, Fresh USD, etc.).
- [Principles](https://961tech.pages.dev/principles): Engineering values that shape decisions.
## Project
- [GitHub repo](https://github.com/Amine32/961tech)
- [Public roadmap](https://github.com/users/Amine32/projects/2)
## Optional
- [ADRs](https://961tech.pages.dev/adr/): Locked architectural decisions.
- [RFCs](https://961tech.pages.dev/rfc/): Proposals under review.
| Surface | M1 | M2 | Deferred |
|---|---|---|---|
/llms.txt curated index, ≤5KB |
Ship | — | — |
.md shadow URLs (/foo ↔ /foo.md returning rendered MDX as text/markdown) for docs pages |
— | M2 candidate | — |
/llms-full.txt (concatenated full docs) |
— | — | Defer until docs are stable |
/llms-ctx.txt / /llms-ctx-full.txt (XML-wrapped for llms_txt2ctx CLI) |
— | — | Skip unless we see a documented consumer |
2.3 Schema.org / JSON-LD¶
Verified against schema.org V30.0 (2026-03-19) and Google Search Central docs.
Per-page-type coverage¶
| Page type | Type(s) | M1 | M2 | Deferred |
|---|---|---|---|---|
| Product detail | Product + AggregateOffer (containing Offer[]) + Brand + BreadcrumbList |
Ship | — | — |
| Category listing | BreadcrumbList + (optional) ItemList of Product references |
M1 (breadcrumb); ItemList M2 | — | — |
| Retailer profile | LocalBusiness (subtype of Organization) + PostalAddress |
— | Ship | — |
| Build detail (saved/shared) | BreadcrumbList only |
M1 (breadcrumb) | — | — |
| Homepage | WebSite + Organization (+ SearchAction if global search ships) |
— | Ship | — |
Review / AggregateRating anywhere |
— | — | — | Defer permanently until 961tech has first-party reviews |
Product properties (schema.org/Product)¶
| Property | Status | Notes |
|---|---|---|
name |
Required | |
image |
Required for merchant listing eligibility | ≥1 URL |
offers |
Required | Use the AggregateOffer + nested Offer[] pattern below |
description |
Recommended | Plain text. Useful for AI grounding |
brand |
Recommended | {"@type": "Brand", "name": "..."} |
sku |
Recommended | 961tech internal ID |
gtin / gtin8/12/13/14 |
Recommended | Pass through if retailer publishes EAN/UPC |
mpn |
Recommended (alt to gtin) |
Manufacturer Part Number — useful for PC parts where GTIN is patchy |
category |
Recommended | E.g. "Computers > Components > GPU" |
additionalProperty |
Recommended for compat specs | Array of PropertyValue for socket, tdp, vramGB, coreCount, etc. |
Offer vs AggregateOffer — the aggregator decision¶
Schema.org explicitly endorses AggregateOffer for the multi-retailer case (https://schema.org/AggregateOffer): "When a single product is associated with multiple offers (for example, the same pair of shoes is offered by different merchants)."
Google's separate guidance: merchant-listing rich-result eligibility requires Offer, not AggregateOffer — "the merchant has to be the seller of the product" (Google's merchant-listing docs). 961tech is an aggregator, not a merchant, so the merchant-listing rich result is unreachable regardless. We remain eligible for the lighter Product snippet rich result.
Recommended pattern: emit BOTH shapes. Product.offers is an AggregateOffer with lowPrice/highPrice/offerCount/priceCurrency AND a nested offers: Offer[] array of individual retailer offers. This satisfies schema.org, satisfies Google Product-snippet, and gives AI assistants a clean structure they can quote ("\(229–\)275 across 3 Lebanese retailers").
{
"@context": "https://schema.org",
"@type": "Product",
"name": "ASUS ROG STRIX RTX 4070 Super",
"image": ["https://961tech.pages.dev/img/asus-rog-strix-rtx4070s.jpg"],
"brand": { "@type": "Brand", "name": "ASUS" },
"sku": "961-prd-rtx4070s-asus-strix",
"mpn": "ROG-STRIX-RTX4070S-O12G-GAMING",
"category": "Computers > Components > GPU",
"additionalProperty": [
{ "@type": "PropertyValue", "name": "vramGB", "value": 12 },
{ "@type": "PropertyValue", "name": "tdpWatts", "value": 220 }
],
"offers": {
"@type": "AggregateOffer",
"priceCurrency": "USD",
"lowPrice": "799.00",
"highPrice": "865.00",
"offerCount": 3,
"offers": [
{
"@type": "Offer",
"url": "https://pcandparts.com/...",
"price": "799.00",
"priceCurrency": "USD",
"availability": "https://schema.org/InStock",
"itemCondition": "https://schema.org/NewCondition",
"seller": { "@type": "Organization", "name": "PCAndParts" },
"priceValidUntil": "2026-05-05"
}
]
}
}
Offer.availability — the "Call For Price" question¶
Canonical ItemAvailability enum (https://schema.org/ItemAvailability): BackOrder, Discontinued, InStock, InStoreOnly, LimitedAvailability, MadeToOrder, OnlineOnly, OutOfStock, PreOrder, PreSale, Reserved, SoldOut. There is no QuoteOnRequest.
961Souq has ~78% of CPU listings as "Call For Price" (retailers.md §2.2) — this is structural, not edge-case. Decision goes to RFC-0009; preferred path: emit Offer with availability: https://schema.org/MadeToOrder, omit price and priceCurrency, keep url and seller. Honest, valid schema.org, lets AI assistants surface "available, contact retailer for pricing." Disqualifies the listing from price-bearing rich results (correct — there's no price to show), keeps it in the JSON-LD payload for grounding.
Review and AggregateRating — do not ship¶
Google's review-snippet policy (https://developers.google.com/search/docs/appearance/structured-data/review-snippet) explicitly forbids self-serving aggregation:
- "Don't aggregate reviews or ratings from other websites."
- "Don't rely on human editors to create, curate, or compile ratings information for local businesses."
961tech does not have first-party reviews in M1/M2. Faking AggregateRating (e.g. defaulting to 5 stars or aggregating retailer reviews) gets a manual action and is forbidden. Decision: omit Review and AggregateRating markup entirely until 961tech itself collects reviews (post-M3, gated on a real review submission flow).
LocalBusiness for retailer profiles (M2)¶
For retailer profile pages (/r/[slug] per #10 retailer profile pages):
{
"@context": "https://schema.org",
"@type": "LocalBusiness",
"name": "PCAndParts",
"url": "https://pcandparts.com",
"address": {
"@type": "PostalAddress",
"streetAddress": "...",
"addressLocality": "Beirut",
"addressRegion": "Beirut",
"addressCountry": "LB"
},
"telephone": "+96101...",
"logo": "https://961tech.pages.dev/r/pcandparts/logo.png",
"sameAs": [
"https://www.facebook.com/pcandparts",
"https://www.instagram.com/pcandparts"
],
"priceRange": "$$",
"openingHoursSpecification": [
{ "@type": "OpeningHoursSpecification", "dayOfWeek": ["Monday","Tuesday","Wednesday","Thursday","Friday"], "opens": "09:00", "closes": "19:00" }
]
}
Important nuance. This is 961tech describing a third-party retailer. Google may not award rich results on our domain (canonical authority belongs to the retailer's own site). Treat as AI-assistant grounding payload, not a Google ranking play.
BreadcrumbList¶
Standard format (https://schema.org/BreadcrumbList):
{
"@context": "https://schema.org",
"@type": "BreadcrumbList",
"itemListElement": [
{ "@type": "ListItem", "position": 1, "name": "Components", "item": "https://961tech.pages.dev/parts" },
{ "@type": "ListItem", "position": 2, "name": "GPUs", "item": "https://961tech.pages.dev/parts/gpu" },
{ "@type": "ListItem", "position": 3, "name": "RTX 4070 Super" }
]
}
Last item omits item; position is 1-indexed.
Format + placement¶
- JSON-LD only. Google's stated preference. Microdata/RDFa accepted but inferior.
- Server-rendered. AI crawlers (GPTBot, ClaudeBot, PerplexityBot) often skip JS. Render JSON-LD inside the server component (Next.js 16
app/.../page.tsx). - Placement:
<script type="application/ld+json">in<head>or end of<body>— both accepted by Google.
Validation tooling: - Google Rich Results Test — Google-specific eligibility, will warn on AggregateOffer-only. - Schema Markup Validator — generic schema.org validation.
2.4 OpenGraph + Twitter Card¶
Universal across all pages — cheap, helps social previews, helps AI scrapers ground titles/descriptions/images consistently.
Minimum set for product detail:
<meta property="og:title" content="ASUS ROG STRIX RTX 4070 Super — Lebanese price comparison | 961tech" />
<meta property="og:description" content="Compare ASUS ROG STRIX RTX 4070 Super prices across 3 Lebanese retailers. $799–$865 USD. Real-time stock, last updated 2 hours ago." />
<meta property="og:type" content="product" />
<meta property="og:image" content="https://961tech.pages.dev/img/.../share.png" />
<meta property="og:url" content="https://961tech.pages.dev/p/..." />
<meta property="og:site_name" content="961tech" />
<meta property="og:locale" content="en_US" />
<meta name="twitter:card" content="summary_large_image" />
(og:locale = en_US for now per ADR-0004 English-only. When i18n revisits, this becomes per-page.)
Page types covered:
- Product detail — full set including og:type: product and product-specific fields if Facebook Product Catalog is ever wired up
- Build detail (saved/shared) — full set with og:type: article, image is the build's hero render once that ships (#9)
- Retailer profile — full set with og:type: profile or business.business
- Homepage + category landing — full set with og:type: website
| Surface | M1 | M2 | Deferred |
|---|---|---|---|
| OG title/description/url/image/site_name on every public page | Ship | — | — |
Twitter Card summary_large_image |
Ship | — | — |
og:type: product on product detail |
Ship | — | — |
| Per-build social-share image render (#9) | — | M2 | — |
| Facebook Product Catalog feed | — | — | Defer until paid ads |
2.5 Page-content shape — citability beyond markup¶
This is where 961tech earns citation vs. just qualifying for it. Schema.org tells crawlers "this page is about X"; the first 500 tokens of prose tell the LLM why it should quote this page over the competing ten. Grounded in current AI-assistant behavior:
- Crawlers ingest HTML as text. JSON-LD is text inside a
<script>tag from their perspective; it helps because property names are self-explanatory, but the headline/lead/list structure of the visible page matters more for which sentence gets cited. - AI assistants prefer short, factual, time-stamped, attributable claims. (Hypothesis — observed pattern in citation outputs across Perplexity / Claude / Google AI Overview; no vendor doc states this explicitly.)
- Lebanese-specific framing matters because the competing pages are global (PCPartPicker US prices, Geizhals EU prices, Pricena MENA-but-not-Lebanon); a query like "RTX 4070 in Beirut" gets unambiguously matched by a page that says "RTX 4070 in Lebanon ranges from $799 to $865 across 3 Beirut retailers as of 2026-04-28".
Patterns to ship¶
-
First-paragraph-as-citation on every product detail page. First sentence answers what is this (canonical product name + brand + category). Second sentence answers what does it cost in Lebanon (price range + retailer count + currency). Third sentence answers where to buy it (retailer names, "in stock" affordances). All within ~500 visible tokens before any UI chrome. The product detail page redesign in #28 inherits this constraint.
-
"As of <date>" stat block on the homepage. Visible prose, not just JSON-LD. Examples of the kind of cite-worthy assertion AI overview boxes quote:
- "961tech tracks 1,759 SKUs across 3 Lebanese retailers as of 2026-04-28."
- "Lebanese-market PC parts are predominantly USD-priced; RTX 4070 in Lebanon ranges from $799 to $865 across N listings on 2026-04-28."
- "Macrotronics is the only major Lebanese retailer that displays prices VAT-inclusive; PCAndParts and 961Souq display VAT-exclusive."
Specific, verifiable, time-stamped, Lebanese-specific. These exist because they're the kind of factual claim an AI assistant grounds against — and no other page on the open web makes them. (See RFC-0009 decision 4 for whether the homepage actually surfaces this in M1.)
-
Last updated <Nh ago>per listing row — visible, not buried in tooltip. Lets a user (and a citation engine) verify freshness without round-tripping. Stale-data warning when timestamp is >24h. Aligns withcompetitive-landscape.md§3.6 freshness pattern (Geizhals + PCPartPicker baseline) and counters Pricena's known "outdated price data" weakness. -
Lebanese-specific framing in prose — every category / retailer / build page mentions Lebanon explicitly in the first paragraph. Not "best PC parts comparison" but "Lebanon-specific PC parts price comparison covering Beirut retailers including PCAndParts, 961Souq, and Macrotronics." This is the phrase Perplexity/Claude/ChatGPT match against user queries containing "lebanon" / "beirut".
-
Retailer attribution per price — "Source: PCAndParts, updated 2h ago" inline with each listing row, not in a tooltip. Per
competitive-landscape.md§3.6.
| Surface | M1 | M2 | Deferred |
|---|---|---|---|
| First-paragraph-as-citation on product detail | Ship (constraint into #28) | — | — |
Last updated <Nh ago> per listing row |
Ship | — | — |
| Lebanese-specific framing in prose on every category/landing | Ship | — | — |
| Retailer attribution per price (visible) | Ship | — | — |
| Homepage "as of <date>" stat block | (RFC-0009 decision) | — | — |
| Per-product 1-paragraph editorial intro generated from canonical specs | — | M2 candidate | — |
| Per-build summary prose ("This $1,200 1080p build covers...") | — | M2 candidate (links to #9, #13) | — |
2.6 Machine-readable feeds¶
sitemap.xml — universal¶
Genre baseline: every working aggregator surveyed has one. Newegg has 8 (including a ProductListKeywords_USA.xml for search terms). PCPrices' SPA-fallback /sitemap.xml is an unforced error.
Recommendation: ship one in M1. Single /sitemap.xml (Next.js 16 app/sitemap.ts) covering homepage, all category pages, all retailer profile pages once they exist, all product detail pages, all docs pages. Reference it from robots.txt.
RSS / JSON Feed / public REST API¶
Genre survey: none of 12 peers expose RSS or any public machine-readable feed. Every /feed, /rss, /feed.xml, /rss.xml probed returned 404 / 403 / SPA HTML.
961tech recommendation:
- No public API in M1/M2. Inviting machine-readable scraping of the entire catalog before we have a monetisation hedge is asymmetric — competitors (if any emerge) can pull our data wholesale. Scraping retailer feeds is our differentiation; we don't hand the same to a hypothetical follower.
- RSS for price drops — natural fit once #14 price drop alerts ships. A /feed/price-drops.rss carrying the last 50 price drops is cheap, useful for power users (and bots), and doesn't expose the whole catalog. Defer to M2 alongside #14.
- Per-product structured-data shadow URLs (/p/[slug].md returning rendered product page as text/markdown) — interesting for llms.txt consumers but defer until we see a documented consumer.
| Surface | M1 | M2 | Deferred |
|---|---|---|---|
/sitemap.xml |
Ship | — | — |
Sitemap: reference in robots.txt |
Ship | — | — |
/feed/price-drops.rss |
— | M2 (with #14) | — |
/p/[slug].md shadow URLs |
— | — | Defer |
Public REST API (/api/v1/products, etc.) |
— | — | Defer to M3+ once monetisation is settled |
| Facebook Product Catalog feed | — | — | Defer to paid ads era |
3. What AI assistants actually do — grounded in vendor docs¶
This section is all Hypothesis-grade unless explicitly tagged otherwise. No major AI assistant publishes a complete grounding pipeline. Recommendations on content shape follow from observed citation behavior + the structural fact that all crawlers ingest HTML as text.
| Assistant | Crawler stack | What is documented | What's Hypothesis |
|---|---|---|---|
| ChatGPT (browse) | ChatGPT-User for live fetch; OAI-SearchBot for index; GPTBot for training |
Vendor-stated: respects robots.txt for GPTBot + OAI-SearchBot. ChatGPT-User "rules may not apply" because user-initiated. |
Citation source = the live-fetched page text including JSON-LD as text. No public statement that JSON-LD is parsed structurally. |
| Claude (web tool) | Claude-User for live fetch; Claude-SearchBot for index; ClaudeBot for training |
Vendor-stated: all bots respect robots.txt. |
Same — text ingestion; structured-data parsing not documented. |
| Perplexity | PerplexityBot for index; Perplexity-User for live fetch |
Vendor-stated: index respects robots.txt; user-initiated "generally ignores." Explicitly NOT used for training. |
Major shopping/comparison citation surface. (Hypothesis — observed citation pattern; no vendor commitment.) |
| Google AI Overviews | Googlebot only — no separate UA |
Vendor-stated: AI Overviews layer on top of standard Search index. Google-Extended controls Gemini training but NOT AI Overviews. |
Schema.org rich-result eligibility transitively benefits AI Overview eligibility. (Hypothesis — Google doesn't publish AI Overview ranking signals.) |
| Apple Intelligence | Applebot for index; Applebot-Extended for training opt-out |
Vendor-stated: data may feed Apple foundation models unless Applebot-Extended is disallowed. |
Lebanese iPhone share is meaningful; Siri/Spotlight grounding flows through Applebot. (Hypothesis — adoption data not public.) |
| Meta AI / WhatsApp AI | meta-webindexer for AI search; meta-externalfetcher for on-demand; facebookexternalhit for link previews |
Vendor-stated: "allowing Meta-WebIndexer helps us cite and link to your content in Meta AI's responses." | WhatsApp link previews via facebookexternalhit are a Lebanese-specific channel — Lebanese commerce is WhatsApp-heavy (personas.md §5.5). (Hypothesis on volume.) |
| DuckDuckGo (DuckAssist) | DuckAssistBot |
Vendor-stated: respects robots.txt, ~72h propagation, NOT used for training. |
Smaller share; relevant for privacy-conscious cohort. |
The honest summary. Every AI assistant's crawler stack is documented. Every assistant's grounding behavior (what makes a page get cited) is not. We optimize for the textually-obvious things — crawlers can read the page, the page is fast, the first paragraph is factual and time-stamped, JSON-LD is present and well-formed, and the URL is stable — and let the rest follow. Recommendations in §2 are calibrated to this honest uncertainty.
4. M1 / M2 / deferred summary¶
Single source-of-truth table. Every recommendation in §2 surfaced here.
M1 (this milestone — implementation ticket pending)¶
robots.txtallowing all AI UAs (training, AI-search index, on-demand) + disallowing/api/go/+Sitemap:directive/sitemap.xmlcovering homepage, category pages, product detail pages, docs pages/llms.txtcurated index (≤5KB)- Schema.org JSON-LD on product detail:
Product+AggregateOffer(with nestedOffer[]) +Brand+BreadcrumbList+additionalPropertyfor compat-relevant specs Offer.availabilitymapping includingMadeToOrderfor "Call For Price" listingsOffer.priceValidUntilset to next-scrape-window-endBreadcrumbListJSON-LD on category + product + build detail pages- OpenGraph + Twitter Card on every public page
- First-paragraph-as-citation prose pattern on product detail (constraint into #28)
Last updated <Nh ago>visible per listing row- Lebanese-specific framing in prose on every category/landing
- Retailer attribution per price (visible)
M2¶
LocalBusinessJSON-LD on retailer profile pages (depends on #10)- Homepage
WebSite+Organization+SearchActionJSON-LD ItemListJSON-LD on category listing pages- Per-build social-share image render (depends on #9)
/feed/price-drops.rss(depends on #14).mdshadow URLs for docs pages (candidate, not committed)- Per-product 1-paragraph editorial intro from canonical specs (candidate)
- Per-build summary prose (candidate, depends on #9, #13)
Deferred¶
Review/AggregateRatingmarkup — until 961tech has first-party reviews (M3+, gated on review submission flow)/llms-full.txt— until docs are stable- Public REST API — until monetisation is settled (M3+)
- Facebook Product Catalog feed — paid ads era
- Cloudflare "Block AI training" managed rule — until specific scraping abuse
- Per-product
/p/[slug].mdshadow URLs — until we observe a documented consumer - Schema.org
Datasettype for homepage stats — never (wrong type per Google docs)
5. Comparable aggregators' posture (April 2026 snapshot)¶
Cross-reference table for competitive-landscape.md §3.6. Verbatim from each site's live robots.txt (or Wayback for CF-walled sites). Fetched 2026-04-28.
| Site | AI training UAs | AI-search/assistant UAs | llms.txt |
Sitemap | RSS |
|---|---|---|---|---|---|
| PCPartPicker | All major training UAs blocked (CF-managed: GPTBot, ClaudeBot, CCBot, Google-Extended, Bytespider, Amazonbot, Applebot-Extended, meta-externalagent) + Content-Signal: search=yes,ai-train=no |
None blocked | 404 | yes (403 direct) | none |
| Geizhals | Only meta-externalagent blocked |
None blocked | 403 | yes | none |
| Idealo | Applebot-Extended blocked with surgical path-allowlist (/unternehmen, /legal/, /magazin); omgilibot same |
None of GPTBot/ClaudeBot in fetched range | 503 (CF) | not checked | not checked |
| Skroutz | All major training UAs blocked (ClaudeBot, anthropic-ai, CCBot, Bytespider, Amazonbot, PetalBot) |
Tiered — OAI-SearchBot, GPTBot, ChatGPT-User, Google-Extended, PerplexityBot get Allow:/$ + HTML-only allowlist (/c/*.html$, /s/*/*.html$); parameterized URLs disallowed |
403 | yes | none |
| Pricena | None | None | 404 | yes (points at HTML page, not XML) | none |
| EG-PC | None | None | 404 | yes | none |
| EGPrices | None | None | 403 | yes | none |
| PCPrices | (SPA fallback — no real robots.txt) |
(SPA fallback) | (SPA fallback) | (SPA fallback) | none |
| BuildMyPC | All major training UAs blocked (CF-managed, byte-identical to PCPartPicker) | None | 404 | yes | none |
| Logical Increments | (No real robots.txt) |
— | 403 | unknown | unknown |
| Newegg | None — zero AI directives | None | 404 | yes (8 sitemaps) | none |
| LDLC | None — zero AI directives | None | 404 | 7 sitemaps (one per locale) | none real |
Cross-cutting findings (informs RFC-0009 robots.txt decision):
- No consensus posture. Plurality (5/12) does nothing about AI bots. 2/12 blanket-block via Cloudflare-managed rule. Only Skroutz hand-tiers.
- Skroutz's tiered model is the most thoughtful — block training, allow AI-search/assistant on HTML-only, deny on parametric search. Worth revisiting once 961tech is large enough for nuance to matter.
llms.txtadoption in genre is zero. First-mover opportunity (or tells us the format is genre-irrelevant — both possible).- RSS / public feeds in genre is zero. Sitemaps carry the load; Newegg's 8-sitemap fan-out is the most ambitious.
- Universal pattern: hide the click-out redirector. Every aggregator disallows it (
/redir/Geizhals,/preisvergleich/Relocate/Idealo,/partenaire/LDLC,/m/...ajax/...storageApiNewegg). 961tech's/api/go/follows the same logic.
6. Open questions¶
Surfaced for RFC-0009 decision; not resolved here.
robots.txtposture conflict with #41 monetisation. Blocking AI training crawlers (GPTBot,ClaudeBot,Google-Extended,Applebot-Extended) closes a future B2B data-licensing revenue stream. Allowing them gives data away free for training. Today, neither path is monetised; the citation path is what matters. Surfaced as RFC-0009 Open Question.- Schema.org "Call For Price" mapping.
MadeToOrderis the closest canonical;LimitedAvailabilityis a fallback; omitting theOfferis the strict-honest path. ~78% of one retailer's CPU listings are in this state. RFC-0009 decision. - Homepage "as of <date>" stat block — does it ship in M1, or wait for #28 page design? Cross-cuts page design. RFC-0009 decision.
/sitemap.xml— should retailer profile pages be in M1 even though #10 hasn't shipped? Or do we ship sitemap.xml with what exists today (homepage + product pages + docs) and extend?- Cloudflare Bot Management. #44 security inherits this. The relevant question for this doc: do we expect Cloudflare's "Block AI training" managed rule to become part of our posture, and if so, when does that trigger?
- English-only constraint vs. Arabic SERP grounding. Per ADR-0004, 961tech ships English-only through M2. AI assistants asked Arabic queries ("قطع كمبيوتر بيروت") may have less to ground from on our pages. Mitigated by the search-input Arabic-tolerance per RFC-0003, but the grounding text is English. Worth probing in M2 telemetry per
personas.md§7 Arabic-cohort drift signal.
7. See also¶
- RFC-0009 — AI discoverability — the decisions this reference doc informs
competitive-landscape.md§3.6 Trust + transparency patterns — peer transparency posturecompetitive-landscape.md§4.1 Pricena's deliberate Lebanon skip — the strategic gap citation traffic exploitscompetitive-landscape.md§4.6 Search-engine surface — uncontested SEO (and by extension AI-grounding) territory across EN/FR/ARpersonas.md§5.5 Where they shop today — entry-channel mix; AI-citation-driven entry would be additive to FB-group + WhatsApp-group + direct-retailer mixretailers.md— per-retailer "Call For Price" volume informs the schema.orgavailabilitydecision- ADR-0004 English-only language scope — constrains
og:localeand grounding-text language - ADR-0005 Casual flow parallel to Builder — UC-13 single-product pages are the Casual track's primary citation surface; UC-9 builds are the Builder track's
- RFC-0003 Search backend — Arabic-tolerant search input + Lebanese transliteration synonyms; orthogonal to AI-grounding but interacts on Arabic-query coverage
- #38 SEO strategy — Google-search-cited surface; structured data overlaps; settle whichever lands second against whichever lands first
- #28 page design — inherits the first-paragraph-as-citation constraint and the
Last updatedper-row pattern - #43 KPIs / observability / perf budget — AI-citation rate as a KPI candidate (referrer =
chat.openai.com,claude.ai,perplexity.ai,gemini.google.com, etc.) - #44 security — rate-limit + Cloudflare WAF; enforces what
robots.txtdeclares - #29 DB / ingest pipeline —
Listing.lastSeenAtalready exists perdata-model.md; surface it as theLast updatedUI signal and thepriceValidUntilJSON-LD value - Issue #47 — the foundation ticket this doc closes alongside RFC-0009
Sources cited (canonical):
- OpenAI bots: platform.openai.com/docs/bots
- Anthropic bots: support.claude.com/en/articles/8896518
- Perplexity bots: docs.perplexity.ai/guides/bots
- Google bots: developers.google.com/search/docs/crawling-indexing/google-common-crawlers
- Apple bots: support.apple.com/en-us/119829
- Meta bots: developers.facebook.com/docs/sharing/bot/
- DuckDuckGo bots: duckduckgo.com/duckduckgo-help-pages/results/duckassistbot
- llms.txt spec: llmstxt.org — backed by github.com/AnswerDotAI/llms-txt
- Schema.org Product: schema.org/Product
- Schema.org Offer: schema.org/Offer
- Schema.org AggregateOffer: schema.org/AggregateOffer
- Schema.org ItemAvailability: schema.org/ItemAvailability
- Schema.org LocalBusiness: schema.org/LocalBusiness
- Schema.org BreadcrumbList: schema.org/BreadcrumbList
- Google merchant listing: developers.google.com/search/docs/appearance/structured-data/merchant-listing
- Google review snippet (self-serving policy): developers.google.com/search/docs/appearance/structured-data/review-snippet
- Google intro to structured data: developers.google.com/search/docs/appearance/structured-data/intro-structured-data
- Google Rich Results Test: search.google.com/test/rich-results
- Schema Markup Validator: validator.schema.org