Issue #47: AI discoverability (LLM-citable surface)¶
- Issue: #47
- Started: 2026-04-28
- Completed: in progress
Goal¶
Ship two artifacts that, together, define what 961tech does so AI assistants cite us when Lebanese users ask ChatGPT / Claude / Perplexity / Google AI Overviews / Apple Intelligence for PC parts recommendations or price comparisons:
docs/reference/ai-discoverability.md— per-surface (schema.org,llms.txt,robots.txt, page-content shape, OpenGraph, machine-readable feeds) recommendations with M1 / M2 / deferred status. Grounded in current AI-assistant behavior (verified against official vendor docs), not aspirational SEO folklore.docs/rfc/0009-ai-discoverability.md— five decisions to take to MASTER:robots.txtposture,llms.txtadopt-or-skip, schema.org coverage scope, homepage "as of" assertion block, machine-readable feed (RSS / JSON / API) yes/no.
Both land in one atomic commit on feat/issue-47. No code. No push. No PR.
Out of scope¶
- No code. No
next.config.tsdirectives, no JSON-LD components, noapp/robots.ts, noapp/sitemap.ts. Implementation is a follow-up code ticket — file when surfaced. - No SEO strategy in the classic Google-search-rank sense. That's #38. The two interact (structured data + page content overlap) and we cross-reference, but we don't litigate Google-search positioning here.
- No KPI definition. #43 owns metrics. We surface "AI-citation rate as a KPI candidate" as a cross-reference, not a definition.
- No security / WAF design. #44 owns rate-limiting + Cloudflare Bot Management. We cross-reference at the boundary (robots.txt is policy, WAF is enforcement).
- No new schema columns beyond noting that
lastUpdated/scrapedAtneed to be exposable on product detail pages — surfaced as a cross-reference to #29.
Approach¶
Foundation-research workflow: read context → dispatch parallel research → synthesise reference doc → distill decisions into RFC → cross-reference → mkdocs strict → one commit.
Why this shape: AI-assistant scraping behavior is sparsely documented and changes fast. A reference doc that summarizes vendor truth (with citations to official docs) is the durable artifact; an RFC is the decision artifact MASTER signs off on. Splitting them lets the reference doc stay updatable as vendor docs evolve, without the RFC re-opening every time GPTBot's docs URL moves.
Parallel research per superpowers:dispatching-parallel-agents (already dispatched at plan time):
- AI scraper user-agents — official OpenAI / Anthropic / Perplexity / Google / Apple / Meta docs for
GPTBot,ChatGPT-User,OAI-SearchBot,ClaudeBot,Claude-User,PerplexityBot,Perplexity-User,Google-Extended,GoogleOther,Applebot,Applebot-Extended. Distinction between training-time crawlers (block-once-lose-future) and on-demand fetchers (block = no citation). llms.txtspec + adoption — current spec at llmstxt.org, adoption survey across our peer aggregators + a sample of well-known docs sites (Anthropic, Vercel, Stripe), honest assessment of "is this real or aspirational."- Schema.org Product / Offer / AggregateOffer / Review / Organization / LocalBusiness — verified against the official schema.org docs (NOT a 2023 SEO blog), including the subtle Google-vs-schema.org property name distinctions and the
availabilityenum (specifically: is there a clean "Call For Price" value? We have ~50% of one retailer's catalog in this state). - Comparable aggregators'
robots.txt+ AI stance — actual fetched content from PCPartPicker, Geizhals, Idealo, Skroutz, Pricena, EGPrices, EG-PC, BuildMyPC, Logical Increments, Newegg, LDLC. Synthesis of "what the genre's posture is."
Steps¶
- Step 1: Stub the plan + reference doc + RFC files
- Plan: this file.
- Reference doc:
docs/reference/ai-discoverability.mdwith frontmatter (title,description,status: active,tags: [reference, seo, ai, foundation]) and a section skeleton (Scope & method, Per-surface recommendations §2.1–§2.6, What AI assistants actually do, M1 / M2 / deferred summary, Open questions, See also). - RFC:
docs/rfc/0009-ai-discoverability.mdwith frontmatter (title,description,status: new,tags: [rfc, seo, ai]) and the standard RFC sections (Summary, Motivation, Proposal, Trade-offs, Alternatives, Open questions, Implementation plan, Out of scope). - Add row to
docs/reference/index.mdlinking the new reference doc. - Add row to
docs/rfc/index.mdlinking the new RFC. - Add row to
docs/plans/index.mdlinking this plan. -
Verification:
mkdocs build --strictafter stubbing — fails on broken refs only if cross-link targets are wrong. -
Step 2: Wait for research subagents to return
- All four agents launched in parallel via
superpowers:dispatching-parallel-agents. - When each returns, capture the structured findings into a working buffer (the synthesis happens in-line in steps 3-4, not a separate notes file).
- If any agent's research is sparse / blocked / contradictory: write the limitation explicitly into the reference doc's §1 Honest Limits, do not paper over with confident-sounding fiction.
-
Verification: each agent's response either contains usable findings or an explicit "couldn't verify because X" — both are valid inputs to the doc.
-
Step 3: Fill
docs/reference/ai-discoverability.md - §1 Scope & method — what this doc is, what it isn't, sources, honest limits (especially: AI-assistant indexing behavior is underspecified by vendors, claims tagged as
Vendor-stated/Hypothesis/Untestedper personas.md confidence taxonomy). - §2 Per-surface recommendations — one subsection per surface:
- §2.1
robots.txt— table of every AI UA we've identified (training vs on-demand), per-UA recommendation (Allow / Block / Allow-with-rate-limit), with M1 / M2 / deferred status. - §2.2
llms.txt— what the spec is (verified against llmstxt.org), what to put in it for an aggregator (curated index of high-citability URLs: per-category pages, retailer profiles, "as-of" stat block, build-guide pages once they exist), where to host (/llms.txtat root). Adoption table from research agent #2. - §2.3 Schema.org / JSON-LD — per-page-type recommendations: Product detail (
Product+AggregateOffer+Brand+BreadcrumbList), retailer profile (Organization+LocalBusiness+PostalAddress), homepage (Organization+ optionalDatasetfor the stat block — verify the latter is meaningful or skip). Per-property required / recommended status. JSON-LD placement (page-bottom inside Next.js<Script type="application/ld+json">). Tooling references (Rich Results Test URL, Schema Markup Validator URL). - §2.4 OpenGraph + Twitter Card — minimal set (
og:title,og:description,og:image,og:url,og:type=product,twitter:card=summary_large_image). Universal across all pages; cheap; helps social and AI scrapers. - §2.5 Page-content shape — first-paragraph-as-citation pattern, "as of
" assertion block on homepage and per-product detail, lastUpdatedtimestamp display per listing row, Lebanese-specific framing in prose. Grounded in personas + competitive findings (Lebanese-Arabic-friendly text via tolerant search input but English-only UI per ADR-0004, so the cite-worthy assertions live in English). - §2.6 Machine-readable feeds — RSS (price drops? new listings?), JSON Feed, public API. Decision deferred to RFC; reference doc surveys what peers do.
- §2.1
- §3 What AI assistants actually do with this — grounded in research agent #1 findings: which assistants honor
robots.txtand which don't, which fetch on-demand vs train-only, what's documented vs unknown. Honest about gaps. - §4 M1 / M2 / deferred summary — single table, every recommendation tagged.
- §5 Open questions — cross-reference into the RFC's open questions where appropriate.
- See also — links to RFC-0009, #38 SEO, #28 page design, #43 KPIs, #44 security, #29 DB, personas.md, competitive-landscape.md, retailers.md, ADR-0004, ADR-0005, RFC-0003.
-
Verification: every recommendation has an M1 / M2 / deferred tag; every external claim has a citation (vendor doc URL); every
Hypothesis/Untestedclaim is marked. -
Step 4: Fill
docs/rfc/0009-ai-discoverability.md - Five decisions per the ticket scope:
robots.txtposture — open / training-blocked / strict-allow-list / paid-only? Recommendation grounded in the genre survey (research agent #4) + Lebanon-specific posture (we're trying to grow citation, not gate competitors yet).llms.txtadopt-or-skip — yes / no, at what URL, what to put in it. Recommendation grounded in research agent #2.- Schema.org coverage scope — all product pages + retailer pages? Or only confidence-high data? Specifically: what to do about the ~50% "Call For Price" 961Souq listings — omit
Offerentirely, ship withavailability=PreOrder, or define a custompriceSpecification? Recommendation grounded in schema.org'savailabilityenum. - Homepage "as of
" stat block — yes / no? Specific assertions: SKU count, retailer count, last-scraped timestamp, price-range examples. Worth marking asDatasetschema or just plain prose? Grounded in "AI assistants love verifiable, time-stamped claims" pattern. - Machine-readable feed — RSS / JSON Feed / public REST API / nothing? Lean toward minimal (RSS for price drops once #14 ships; no public API in M1/M2 — would invite scraping).
- Trade-offs — every decision has a trade-off; surface them honestly.
- Alternatives — for each decision, what was considered and why set aside.
- Open questions — surface anything that needs MASTER's call before implementation:
- Does blocking AI training crawlers conflict with #41 monetisation? Specifically: future B2B data-licensing revenue stream for AI training datasets.
- The schema.org "Call For Price" question above.
- Implementation plan — checklist that maps to a future code ticket. Not implemented in this RFC; surfaced for traceability.
- Out of scope — explicit list (matches plan's Out of scope).
-
Verification: every section is non-empty; every decision has an explicit recommendation (not "TBD"); every open question has options for MASTER to pick from (multiple choice, not open-ended).
-
Step 5: Cross-reference
- Update
docs/reference/index.mdrow forai-discoverability.md(already added in Step 1, verify final wording). - Update
docs/rfc/index.mdrow for RFC-0009 (already added, verify final wording). - Add a
See alsoentry todocs/reference/competitive-landscape.mdif anti-scraping / AI stance is a meaningful new dimension. (Likely yes — competitive doc §3.6 "Trust + transparency" surveys peer aggregators' transparency, and AI-citation posture sits in that family.) - No glossary updates (no new domain terms —
llms.txt,robots.txt,JSON-LDare external and don't belong in our glossary). -
Verification:
mkdocs build --strictfrom clean state passes with no broken refs. -
Step 6: Verify + commit
- Run
mkdocs build --strictonce more. git status— should show only the new files (plan, reference doc, RFC) plus updates todocs/reference/index.md,docs/rfc/index.md,docs/plans/index.md, and possiblydocs/reference/competitive-landscape.mdSee also.- One atomic commit on
feat/issue-47with conventional message:(Usedocs: AI-discoverability foundation (#47) Adds reference doc + RFC-0009 covering robots.txt posture for AI crawlers, llms.txt adoption, schema.org Product/Offer/AggregateOffer coverage, page-content citability, and machine-readable feed posture. Refs #47Refs #47, notCloses #47— the issue closes when the RFC is accepted by MASTER, which is a separate event.) - No push. No PR. Stop here per ticket workflow.
- Verification:
git log -1 --statshows the expected files;git statusis clean; branch isfeat/issue-47; remote is unchanged.
Risks¶
| Risk | Likelihood | Mitigation |
|---|---|---|
| AI-assistant indexing behavior is poorly documented; recommendations land on shaky evidence | High | Tag every claim per confidence taxonomy; flag gaps explicitly in §1 Honest limits; cite vendor docs by URL where they exist; refuse to fabricate |
llms.txt is a fad and won't matter in 12 months |
Medium | Spec is cheap to ship (one curated markdown file at root); surface this as an Open question in the RFC and let MASTER decide; even if it dies, the curated index has SEO value as plain /curated-index.md |
| Blocking AI training closes a future B2B data-licensing revenue door | Medium | Surface as an explicit Open question in the RFC tied to #41 monetisation; recommend the open posture by default and let MASTER override |
Schema.org's availability enum has no clean "Call For Price" value, forcing an awkward choice |
High | Research agent #3 specifically asked to verify; if no clean value, surface the omit-vs-PreOrder-vs-custom decision in the RFC's open questions |
| Recommendations conflict with #38 SEO once that ticket lands | Low | Cross-reference both ways; the AI surface is a subset of the SEO surface (structured data overlaps), so #38 inherits these decisions if it lands second; if #38 lands first, we adopt its choices and reduce to the AI-specific delta |
| The plan ships before the research subagents return useful findings | Low | Step 2 explicitly waits; if any agent comes back empty, the doc says "couldn't verify because X" — that's a valid output |
Tests¶
No automated tests — docs-only ticket. Verification is mkdocs build --strict (catches broken refs) plus manual readthrough of every new page in mkdocs serve.
If a future code ticket implements any of these (app/robots.ts, app/sitemap.ts, JSON-LD components), that ticket lands its own vitest specs.
Doc updates¶
- Reference:
docs/reference/ai-discoverability.md(new) - RFC:
docs/rfc/0009-ai-discoverability.md(new) - Plan:
docs/plans/2026-04-28-issue-47-ai-discoverability.md(this file) - Index:
docs/reference/index.md(add row) - Index:
docs/rfc/index.md(add row) - Index:
docs/plans/index.md(add row) - Cross-reference:
docs/reference/competitive-landscape.mdSee also(optional, only if it adds non-redundant context) - Architecture: none — no architecture changes (no code, no schema)
- ADR: none yet — ADR is the output of this RFC once MASTER accepts it
- Glossary: none —
llms.txt,robots.txt, JSON-LD are external standards - Issue body: surface any follow-up code tickets discovered during writing
Rollback¶
git revert <commit> on the single atomic commit. No external state, no schema changes, no published artifacts — rollback is trivial.