Skip to content

Issue #47: AI discoverability (LLM-citable surface)

  • Issue: #47
  • Started: 2026-04-28
  • Completed: in progress

Goal

Ship two artifacts that, together, define what 961tech does so AI assistants cite us when Lebanese users ask ChatGPT / Claude / Perplexity / Google AI Overviews / Apple Intelligence for PC parts recommendations or price comparisons:

  1. docs/reference/ai-discoverability.md — per-surface (schema.org, llms.txt, robots.txt, page-content shape, OpenGraph, machine-readable feeds) recommendations with M1 / M2 / deferred status. Grounded in current AI-assistant behavior (verified against official vendor docs), not aspirational SEO folklore.
  2. docs/rfc/0009-ai-discoverability.md — five decisions to take to MASTER: robots.txt posture, llms.txt adopt-or-skip, schema.org coverage scope, homepage "as of " assertion block, machine-readable feed (RSS / JSON / API) yes/no.

Both land in one atomic commit on feat/issue-47. No code. No push. No PR.

Out of scope

  • No code. No next.config.ts directives, no JSON-LD components, no app/robots.ts, no app/sitemap.ts. Implementation is a follow-up code ticket — file when surfaced.
  • No SEO strategy in the classic Google-search-rank sense. That's #38. The two interact (structured data + page content overlap) and we cross-reference, but we don't litigate Google-search positioning here.
  • No KPI definition. #43 owns metrics. We surface "AI-citation rate as a KPI candidate" as a cross-reference, not a definition.
  • No security / WAF design. #44 owns rate-limiting + Cloudflare Bot Management. We cross-reference at the boundary (robots.txt is policy, WAF is enforcement).
  • No new schema columns beyond noting that lastUpdated / scrapedAt need to be exposable on product detail pages — surfaced as a cross-reference to #29.

Approach

Foundation-research workflow: read context → dispatch parallel research → synthesise reference doc → distill decisions into RFC → cross-reference → mkdocs strict → one commit.

Why this shape: AI-assistant scraping behavior is sparsely documented and changes fast. A reference doc that summarizes vendor truth (with citations to official docs) is the durable artifact; an RFC is the decision artifact MASTER signs off on. Splitting them lets the reference doc stay updatable as vendor docs evolve, without the RFC re-opening every time GPTBot's docs URL moves.

Parallel research per superpowers:dispatching-parallel-agents (already dispatched at plan time):

  1. AI scraper user-agents — official OpenAI / Anthropic / Perplexity / Google / Apple / Meta docs for GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-User, PerplexityBot, Perplexity-User, Google-Extended, GoogleOther, Applebot, Applebot-Extended. Distinction between training-time crawlers (block-once-lose-future) and on-demand fetchers (block = no citation).
  2. llms.txt spec + adoption — current spec at llmstxt.org, adoption survey across our peer aggregators + a sample of well-known docs sites (Anthropic, Vercel, Stripe), honest assessment of "is this real or aspirational."
  3. Schema.org Product / Offer / AggregateOffer / Review / Organization / LocalBusiness — verified against the official schema.org docs (NOT a 2023 SEO blog), including the subtle Google-vs-schema.org property name distinctions and the availability enum (specifically: is there a clean "Call For Price" value? We have ~50% of one retailer's catalog in this state).
  4. Comparable aggregators' robots.txt + AI stance — actual fetched content from PCPartPicker, Geizhals, Idealo, Skroutz, Pricena, EGPrices, EG-PC, BuildMyPC, Logical Increments, Newegg, LDLC. Synthesis of "what the genre's posture is."

Steps

  • Step 1: Stub the plan + reference doc + RFC files
  • Plan: this file.
  • Reference doc: docs/reference/ai-discoverability.md with frontmatter (title, description, status: active, tags: [reference, seo, ai, foundation]) and a section skeleton (Scope & method, Per-surface recommendations §2.1–§2.6, What AI assistants actually do, M1 / M2 / deferred summary, Open questions, See also).
  • RFC: docs/rfc/0009-ai-discoverability.md with frontmatter (title, description, status: new, tags: [rfc, seo, ai]) and the standard RFC sections (Summary, Motivation, Proposal, Trade-offs, Alternatives, Open questions, Implementation plan, Out of scope).
  • Add row to docs/reference/index.md linking the new reference doc.
  • Add row to docs/rfc/index.md linking the new RFC.
  • Add row to docs/plans/index.md linking this plan.
  • Verification: mkdocs build --strict after stubbing — fails on broken refs only if cross-link targets are wrong.

  • Step 2: Wait for research subagents to return

  • All four agents launched in parallel via superpowers:dispatching-parallel-agents.
  • When each returns, capture the structured findings into a working buffer (the synthesis happens in-line in steps 3-4, not a separate notes file).
  • If any agent's research is sparse / blocked / contradictory: write the limitation explicitly into the reference doc's §1 Honest Limits, do not paper over with confident-sounding fiction.
  • Verification: each agent's response either contains usable findings or an explicit "couldn't verify because X" — both are valid inputs to the doc.

  • Step 3: Fill docs/reference/ai-discoverability.md

  • §1 Scope & method — what this doc is, what it isn't, sources, honest limits (especially: AI-assistant indexing behavior is underspecified by vendors, claims tagged as Vendor-stated / Hypothesis / Untested per personas.md confidence taxonomy).
  • §2 Per-surface recommendations — one subsection per surface:
    • §2.1 robots.txt — table of every AI UA we've identified (training vs on-demand), per-UA recommendation (Allow / Block / Allow-with-rate-limit), with M1 / M2 / deferred status.
    • §2.2 llms.txt — what the spec is (verified against llmstxt.org), what to put in it for an aggregator (curated index of high-citability URLs: per-category pages, retailer profiles, "as-of" stat block, build-guide pages once they exist), where to host (/llms.txt at root). Adoption table from research agent #2.
    • §2.3 Schema.org / JSON-LD — per-page-type recommendations: Product detail (Product + AggregateOffer + Brand + BreadcrumbList), retailer profile (Organization + LocalBusiness + PostalAddress), homepage (Organization + optional Dataset for the stat block — verify the latter is meaningful or skip). Per-property required / recommended status. JSON-LD placement (page-bottom inside Next.js <Script type="application/ld+json">). Tooling references (Rich Results Test URL, Schema Markup Validator URL).
    • §2.4 OpenGraph + Twitter Card — minimal set (og:title, og:description, og:image, og:url, og:type=product, twitter:card=summary_large_image). Universal across all pages; cheap; helps social and AI scrapers.
    • §2.5 Page-content shape — first-paragraph-as-citation pattern, "as of " assertion block on homepage and per-product detail, lastUpdated timestamp display per listing row, Lebanese-specific framing in prose. Grounded in personas + competitive findings (Lebanese-Arabic-friendly text via tolerant search input but English-only UI per ADR-0004, so the cite-worthy assertions live in English).
    • §2.6 Machine-readable feeds — RSS (price drops? new listings?), JSON Feed, public API. Decision deferred to RFC; reference doc surveys what peers do.
  • §3 What AI assistants actually do with this — grounded in research agent #1 findings: which assistants honor robots.txt and which don't, which fetch on-demand vs train-only, what's documented vs unknown. Honest about gaps.
  • §4 M1 / M2 / deferred summary — single table, every recommendation tagged.
  • §5 Open questions — cross-reference into the RFC's open questions where appropriate.
  • See also — links to RFC-0009, #38 SEO, #28 page design, #43 KPIs, #44 security, #29 DB, personas.md, competitive-landscape.md, retailers.md, ADR-0004, ADR-0005, RFC-0003.
  • Verification: every recommendation has an M1 / M2 / deferred tag; every external claim has a citation (vendor doc URL); every Hypothesis / Untested claim is marked.

  • Step 4: Fill docs/rfc/0009-ai-discoverability.md

  • Five decisions per the ticket scope:
    1. robots.txt posture — open / training-blocked / strict-allow-list / paid-only? Recommendation grounded in the genre survey (research agent #4) + Lebanon-specific posture (we're trying to grow citation, not gate competitors yet).
    2. llms.txt adopt-or-skip — yes / no, at what URL, what to put in it. Recommendation grounded in research agent #2.
    3. Schema.org coverage scope — all product pages + retailer pages? Or only confidence-high data? Specifically: what to do about the ~50% "Call For Price" 961Souq listings — omit Offer entirely, ship with availability=PreOrder, or define a custom priceSpecification? Recommendation grounded in schema.org's availability enum.
    4. Homepage "as of " stat block — yes / no? Specific assertions: SKU count, retailer count, last-scraped timestamp, price-range examples. Worth marking as Dataset schema or just plain prose? Grounded in "AI assistants love verifiable, time-stamped claims" pattern.
    5. Machine-readable feed — RSS / JSON Feed / public REST API / nothing? Lean toward minimal (RSS for price drops once #14 ships; no public API in M1/M2 — would invite scraping).
  • Trade-offs — every decision has a trade-off; surface them honestly.
  • Alternatives — for each decision, what was considered and why set aside.
  • Open questions — surface anything that needs MASTER's call before implementation:
    • Does blocking AI training crawlers conflict with #41 monetisation? Specifically: future B2B data-licensing revenue stream for AI training datasets.
    • The schema.org "Call For Price" question above.
  • Implementation plan — checklist that maps to a future code ticket. Not implemented in this RFC; surfaced for traceability.
  • Out of scope — explicit list (matches plan's Out of scope).
  • Verification: every section is non-empty; every decision has an explicit recommendation (not "TBD"); every open question has options for MASTER to pick from (multiple choice, not open-ended).

  • Step 5: Cross-reference

  • Update docs/reference/index.md row for ai-discoverability.md (already added in Step 1, verify final wording).
  • Update docs/rfc/index.md row for RFC-0009 (already added, verify final wording).
  • Add a See also entry to docs/reference/competitive-landscape.md if anti-scraping / AI stance is a meaningful new dimension. (Likely yes — competitive doc §3.6 "Trust + transparency" surveys peer aggregators' transparency, and AI-citation posture sits in that family.)
  • No glossary updates (no new domain terms — llms.txt, robots.txt, JSON-LD are external and don't belong in our glossary).
  • Verification: mkdocs build --strict from clean state passes with no broken refs.

  • Step 6: Verify + commit

  • Run mkdocs build --strict once more.
  • git status — should show only the new files (plan, reference doc, RFC) plus updates to docs/reference/index.md, docs/rfc/index.md, docs/plans/index.md, and possibly docs/reference/competitive-landscape.md See also.
  • One atomic commit on feat/issue-47 with conventional message:
    docs: AI-discoverability foundation (#47)
    
    Adds reference doc + RFC-0009 covering robots.txt posture for AI
    crawlers, llms.txt adoption, schema.org Product/Offer/AggregateOffer
    coverage, page-content citability, and machine-readable feed posture.
    
    Refs #47
    
    (Use Refs #47, not Closes #47 — the issue closes when the RFC is accepted by MASTER, which is a separate event.)
  • No push. No PR. Stop here per ticket workflow.
  • Verification: git log -1 --stat shows the expected files; git status is clean; branch is feat/issue-47; remote is unchanged.

Risks

Risk Likelihood Mitigation
AI-assistant indexing behavior is poorly documented; recommendations land on shaky evidence High Tag every claim per confidence taxonomy; flag gaps explicitly in §1 Honest limits; cite vendor docs by URL where they exist; refuse to fabricate
llms.txt is a fad and won't matter in 12 months Medium Spec is cheap to ship (one curated markdown file at root); surface this as an Open question in the RFC and let MASTER decide; even if it dies, the curated index has SEO value as plain /curated-index.md
Blocking AI training closes a future B2B data-licensing revenue door Medium Surface as an explicit Open question in the RFC tied to #41 monetisation; recommend the open posture by default and let MASTER override
Schema.org's availability enum has no clean "Call For Price" value, forcing an awkward choice High Research agent #3 specifically asked to verify; if no clean value, surface the omit-vs-PreOrder-vs-custom decision in the RFC's open questions
Recommendations conflict with #38 SEO once that ticket lands Low Cross-reference both ways; the AI surface is a subset of the SEO surface (structured data overlaps), so #38 inherits these decisions if it lands second; if #38 lands first, we adopt its choices and reduce to the AI-specific delta
The plan ships before the research subagents return useful findings Low Step 2 explicitly waits; if any agent comes back empty, the doc says "couldn't verify because X" — that's a valid output

Tests

No automated tests — docs-only ticket. Verification is mkdocs build --strict (catches broken refs) plus manual readthrough of every new page in mkdocs serve.

If a future code ticket implements any of these (app/robots.ts, app/sitemap.ts, JSON-LD components), that ticket lands its own vitest specs.

Doc updates

  • Reference: docs/reference/ai-discoverability.md (new)
  • RFC: docs/rfc/0009-ai-discoverability.md (new)
  • Plan: docs/plans/2026-04-28-issue-47-ai-discoverability.md (this file)
  • Index: docs/reference/index.md (add row)
  • Index: docs/rfc/index.md (add row)
  • Index: docs/plans/index.md (add row)
  • Cross-reference: docs/reference/competitive-landscape.md See also (optional, only if it adds non-redundant context)
  • Architecture: none — no architecture changes (no code, no schema)
  • ADR: none yet — ADR is the output of this RFC once MASTER accepts it
  • Glossary: none — llms.txt, robots.txt, JSON-LD are external standards
  • Issue body: surface any follow-up code tickets discovered during writing

Rollback

git revert <commit> on the single atomic commit. No external state, no schema changes, no published artifacts — rollback is trivial.