Issue #47: AI discoverability (LLM-citable surface)¶

Issue: #47
Started: 2026-04-28
Completed: in progress

Goal¶

Ship two artifacts that, together, define what 961tech does so AI assistants cite us when Lebanese users ask ChatGPT / Claude / Perplexity / Google AI Overviews / Apple Intelligence for PC parts recommendations or price comparisons:

docs/reference/ai-discoverability.md — per-surface (schema.org, llms.txt, robots.txt, page-content shape, OpenGraph, machine-readable feeds) recommendations with M1 / M2 / deferred status. Grounded in current AI-assistant behavior (verified against official vendor docs), not aspirational SEO folklore.
docs/rfc/0009-ai-discoverability.md — five decisions to take to MASTER: robots.txt posture, llms.txt adopt-or-skip, schema.org coverage scope, homepage "as of " assertion block, machine-readable feed (RSS / JSON / API) yes/no.

Both land in one atomic commit on feat/issue-47. No code. No push. No PR.

Out of scope¶

No code. No next.config.ts directives, no JSON-LD components, no app/robots.ts, no app/sitemap.ts. Implementation is a follow-up code ticket — file when surfaced.
No SEO strategy in the classic Google-search-rank sense. That's #38. The two interact (structured data + page content overlap) and we cross-reference, but we don't litigate Google-search positioning here.
No KPI definition. #43 owns metrics. We surface "AI-citation rate as a KPI candidate" as a cross-reference, not a definition.
No security / WAF design. #44 owns rate-limiting + Cloudflare Bot Management. We cross-reference at the boundary (robots.txt is policy, WAF is enforcement).
No new schema columns beyond noting that lastUpdated / scrapedAt need to be exposable on product detail pages — surfaced as a cross-reference to #29.

Approach¶

Foundation-research workflow: read context → dispatch parallel research → synthesise reference doc → distill decisions into RFC → cross-reference → mkdocs strict → one commit.

Why this shape: AI-assistant scraping behavior is sparsely documented and changes fast. A reference doc that summarizes vendor truth (with citations to official docs) is the durable artifact; an RFC is the decision artifact MASTER signs off on. Splitting them lets the reference doc stay updatable as vendor docs evolve, without the RFC re-opening every time GPTBot's docs URL moves.

Parallel research per superpowers:dispatching-parallel-agents (already dispatched at plan time):

AI scraper user-agents — official OpenAI / Anthropic / Perplexity / Google / Apple / Meta docs for GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-User, PerplexityBot, Perplexity-User, Google-Extended, GoogleOther, Applebot, Applebot-Extended. Distinction between training-time crawlers (block-once-lose-future) and on-demand fetchers (block = no citation).
llms.txt spec + adoption — current spec at llmstxt.org, adoption survey across our peer aggregators + a sample of well-known docs sites (Anthropic, Vercel, Stripe), honest assessment of "is this real or aspirational."
Schema.org Product / Offer / AggregateOffer / Review / Organization / LocalBusiness — verified against the official schema.org docs (NOT a 2023 SEO blog), including the subtle Google-vs-schema.org property name distinctions and the availability enum (specifically: is there a clean "Call For Price" value? We have ~50% of one retailer's catalog in this state).
Comparable aggregators' robots.txt + AI stance — actual fetched content from PCPartPicker, Geizhals, Idealo, Skroutz, Pricena, EGPrices, EG-PC, BuildMyPC, Logical Increments, Newegg, LDLC. Synthesis of "what the genre's posture is."

Steps¶

Step 1: Stub the plan + reference doc + RFC files
Plan: this file.
Reference doc: docs/reference/ai-discoverability.md with frontmatter (title, description, status: active, tags: [reference, seo, ai, foundation]) and a section skeleton (Scope & method, Per-surface recommendations §2.1–§2.6, What AI assistants actually do, M1 / M2 / deferred summary, Open questions, See also).
RFC: docs/rfc/0009-ai-discoverability.md with frontmatter (title, description, status: new, tags: [rfc, seo, ai]) and the standard RFC sections (Summary, Motivation, Proposal, Trade-offs, Alternatives, Open questions, Implementation plan, Out of scope).
Add row to docs/reference/index.md linking the new reference doc.
Add row to docs/rfc/index.md linking the new RFC.
Add row to docs/plans/index.md linking this plan.
Verification: mkdocs build --strict after stubbing — fails on broken refs only if cross-link targets are wrong.
Step 2: Wait for research subagents to return
All four agents launched in parallel via superpowers:dispatching-parallel-agents.
When each returns, capture the structured findings into a working buffer (the synthesis happens in-line in steps 3-4, not a separate notes file).
If any agent's research is sparse / blocked / contradictory: write the limitation explicitly into the reference doc's §1 Honest Limits, do not paper over with confident-sounding fiction.
Verification: each agent's response either contains usable findings or an explicit "couldn't verify because X" — both are valid inputs to the doc.
Step 3: Fill docs/reference/ai-discoverability.md
§1 Scope & method — what this doc is, what it isn't, sources, honest limits (especially: AI-assistant indexing behavior is underspecified by vendors, claims tagged as Vendor-stated / Hypothesis / Untested per personas.md confidence taxonomy).
§2 Per-surface recommendations — one subsection per surface:
- §2.1 robots.txt — table of every AI UA we've identified (training vs on-demand), per-UA recommendation (Allow / Block / Allow-with-rate-limit), with M1 / M2 / deferred status.
- §2.2 llms.txt — what the spec is (verified against llmstxt.org), what to put in it for an aggregator (curated index of high-citability URLs: per-category pages, retailer profiles, "as-of" stat block, build-guide pages once they exist), where to host (/llms.txt at root). Adoption table from research agent #2.
- §2.3 Schema.org / JSON-LD — per-page-type recommendations: Product detail (Product + AggregateOffer + Brand + BreadcrumbList), retailer profile (Organization + LocalBusiness + PostalAddress), homepage (Organization + optional Dataset for the stat block — verify the latter is meaningful or skip). Per-property required / recommended status. JSON-LD placement (page-bottom inside Next.js <Script type="application/ld+json">). Tooling references (Rich Results Test URL, Schema Markup Validator URL).
- §2.4 OpenGraph + Twitter Card — minimal set (og:title, og:description, og:image, og:url, og:type=product, twitter:card=summary_large_image). Universal across all pages; cheap; helps social and AI scrapers.
- §2.5 Page-content shape — first-paragraph-as-citation pattern, "as of " assertion block on homepage and per-product detail, lastUpdated timestamp display per listing row, Lebanese-specific framing in prose. Grounded in personas + competitive findings (Lebanese-Arabic-friendly text via tolerant search input but English-only UI per ADR-0004, so the cite-worthy assertions live in English).
- §2.6 Machine-readable feeds — RSS (price drops? new listings?), JSON Feed, public API. Decision deferred to RFC; reference doc surveys what peers do.
§3 What AI assistants actually do with this — grounded in research agent #1 findings: which assistants honor robots.txt and which don't, which fetch on-demand vs train-only, what's documented vs unknown. Honest about gaps.
§4 M1 / M2 / deferred summary — single table, every recommendation tagged.
§5 Open questions — cross-reference into the RFC's open questions where appropriate.
See also — links to RFC-0009, #38 SEO, #28 page design, #43 KPIs, #44 security, #29 DB, personas.md, competitive-landscape.md, retailers.md, ADR-0004, ADR-0005, RFC-0003.
Verification: every recommendation has an M1 / M2 / deferred tag; every external claim has a citation (vendor doc URL); every Hypothesis / Untested claim is marked.
Step 4: Fill docs/rfc/0009-ai-discoverability.md
Five decisions per the ticket scope:
1. robots.txt posture — open / training-blocked / strict-allow-list / paid-only? Recommendation grounded in the genre survey (research agent #4) + Lebanon-specific posture (we're trying to grow citation, not gate competitors yet).
2. llms.txt adopt-or-skip — yes / no, at what URL, what to put in it. Recommendation grounded in research agent #2.
3. Schema.org coverage scope — all product pages + retailer pages? Or only confidence-high data? Specifically: what to do about the ~50% "Call For Price" 961Souq listings — omit Offer entirely, ship with availability=PreOrder, or define a custom priceSpecification? Recommendation grounded in schema.org's availability enum.
4. Homepage "as of " stat block — yes / no? Specific assertions: SKU count, retailer count, last-scraped timestamp, price-range examples. Worth marking as Dataset schema or just plain prose? Grounded in "AI assistants love verifiable, time-stamped claims" pattern.
5. Machine-readable feed — RSS / JSON Feed / public REST API / nothing? Lean toward minimal (RSS for price drops once #14 ships; no public API in M1/M2 — would invite scraping).
Trade-offs — every decision has a trade-off; surface them honestly.
Alternatives — for each decision, what was considered and why set aside.
Open questions — surface anything that needs MASTER's call before implementation:
- Does blocking AI training crawlers conflict with #41 monetisation? Specifically: future B2B data-licensing revenue stream for AI training datasets.
- The schema.org "Call For Price" question above.
Implementation plan — checklist that maps to a future code ticket. Not implemented in this RFC; surfaced for traceability.
Out of scope — explicit list (matches plan's Out of scope).
Verification: every section is non-empty; every decision has an explicit recommendation (not "TBD"); every open question has options for MASTER to pick from (multiple choice, not open-ended).
Step 5: Cross-reference
Update docs/reference/index.md row for ai-discoverability.md (already added in Step 1, verify final wording).
Update docs/rfc/index.md row for RFC-0009 (already added, verify final wording).
Add a See also entry to docs/reference/competitive-landscape.md if anti-scraping / AI stance is a meaningful new dimension. (Likely yes — competitive doc §3.6 "Trust + transparency" surveys peer aggregators' transparency, and AI-citation posture sits in that family.)
No glossary updates (no new domain terms — llms.txt, robots.txt, JSON-LD are external and don't belong in our glossary).
Verification: mkdocs build --strict from clean state passes with no broken refs.
Step 6: Verify + commit
Run mkdocs build --strict once more.
git status — should show only the new files (plan, reference doc, RFC) plus updates to docs/reference/index.md, docs/rfc/index.md, docs/plans/index.md, and possibly docs/reference/competitive-landscape.md See also.

One atomic commit on feat/issue-47 with conventional message:

docs: AI-discoverability foundation (#47)

Adds reference doc + RFC-0009 covering robots.txt posture for AI
crawlers, llms.txt adoption, schema.org Product/Offer/AggregateOffer
coverage, page-content citability, and machine-readable feed posture.

Refs #47

(Use Refs #47, not Closes #47 — the issue closes when the RFC is accepted by MASTER, which is a separate event.)

No push. No PR. Stop here per ticket workflow.
Verification: git log -1 --stat shows the expected files; git status is clean; branch is feat/issue-47; remote is unchanged.

Risks¶

Risk	Likelihood	Mitigation
AI-assistant indexing behavior is poorly documented; recommendations land on shaky evidence	High	Tag every claim per confidence taxonomy; flag gaps explicitly in §1 Honest limits; cite vendor docs by URL where they exist; refuse to fabricate
`llms.txt` is a fad and won't matter in 12 months	Medium	Spec is cheap to ship (one curated markdown file at root); surface this as an Open question in the RFC and let MASTER decide; even if it dies, the curated index has SEO value as plain `/curated-index.md`
Blocking AI training closes a future B2B data-licensing revenue door	Medium	Surface as an explicit Open question in the RFC tied to #41 monetisation; recommend the open posture by default and let MASTER override
Schema.org's `availability` enum has no clean "Call For Price" value, forcing an awkward choice	High	Research agent #3 specifically asked to verify; if no clean value, surface the omit-vs-PreOrder-vs-custom decision in the RFC's open questions
Recommendations conflict with #38 SEO once that ticket lands	Low	Cross-reference both ways; the AI surface is a subset of the SEO surface (structured data overlaps), so #38 inherits these decisions if it lands second; if #38 lands first, we adopt its choices and reduce to the AI-specific delta
The plan ships before the research subagents return useful findings	Low	Step 2 explicitly waits; if any agent comes back empty, the doc says "couldn't verify because X" — that's a valid output

Tests¶

No automated tests — docs-only ticket. Verification is mkdocs build --strict (catches broken refs) plus manual readthrough of every new page in mkdocs serve.

If a future code ticket implements any of these (app/robots.ts, app/sitemap.ts, JSON-LD components), that ticket lands its own vitest specs.

Doc updates¶

Rollback¶

git revert <commit> on the single atomic commit. No external state, no schema changes, no published artifacts — rollback is trivial.