Issue #30: Lebanese retailer audit + scraper roadmap¶

Issue: #30
Started: 2026-04-28
Completed: in progress

Goal¶

Ship a single reference doc — docs/reference/retailers.md — that profiles 6-8 Lebanese tech retailers (the 3 currently scraped + 3-5 prioritized-next) on category coverage, pricing model, SKU scale, scraper feasibility (page structure, anti-bot, pagination), and affiliate program. Closes with a scraper roadmap that makes #20 (add 3-5 more retailers) trivial to plan.

Out of scope¶

No scraper code. This is a docs-only ticket. Touching src/scrapers/ is forbidden (issue scope).
No new ADRs/RFCs. Decisions are recommendations per retailer; the actual "we will add X" is finalized in #20.
No re-research of global competitors. docs/reference/competitive-landscape.md (#35) is the source for global context. This doc is Lebanon-only.
No persona work. docs/reference/personas.md (#36) already covers buyer behavior.
No category-scope decisions. That's #32. We describe what categories each retailer carries, we don't prescribe what 961tech indexes.

Approach¶

Two-phase research + one-pass compile:

Currently-scraped (3 retailers) — info already lives in src/scrapers/sites/*.ts (page structure, sold-out signals) and in competitive-landscape §4.4 (UX + density teardowns). One WebFetch per retailer for current category-page structure and rough SKU count; cross-check against the scraper code.
Prioritized-next (4-5 candidates) — picked from competitive-landscape §4.4 candidate list (CompuOne, Mojitech, Ayoub Computers, Multitech, PCBuildingLeb, with PcMacLB / Gamma / Microcity as fallbacks). Dispatched as parallel subagents per superpowers:dispatching-parallel-agents — each subagent visits one retailer's homepage + at least one category page, returns a structured report against the scope checklist in the ticket.
Compile retailers.md with a fixed per-retailer template (matches the ticket's "Per retailer, include" list), order: currently-scraped first, then prioritized-next. Add the Scraper roadmap section (top 3 to add, rationale).
Verify with mkdocs build --strict.

Why this approach: parallel subagents give 4-5 retailer reads in roughly the wall-clock time of one. The competitive-landscape doc already did the cross-cutting Lebanese analysis — we don't repeat that work, we link to it.

Per-retailer template (locked before research)¶

Every retailer entry uses exactly these fields, matching the ticket's scope list:

### N. <Retailer Name>

| | |
|---|---|
| **URL** | <url> |
| **Languages** | <EN/FR/AR coverage> |
| **Categories** | <CPU/GPU/MB/RAM/storage/PSU/case/cooler/peripherals/laptops/prebuilt — what's carried> |
| **Pricing model** | <USD-only / LBP-only / dual / USD with cash-rate calc / call-for-price> |
| **SKU scale** | <small <100 / medium 100-500 / large 500+> + how counted |
| **Page structure** | <server-rendered HTML / SPA / API-backed> + platform if known (Shopify/Woo/custom) |
| **Pagination** | <numbered pages / infinite scroll / load-more / no pagination> |
| **Anti-bot signals** | <none observed / UA gating / Cloudflare / CAPTCHA> |
| **Affiliate program** | <yes/no/unknown> + link if known |
| **Notes** | <contact / business model quirks> |
| **Recommendation** | **H / M / L / Skip** + one-sentence rationale |

Steps¶

Step 1: Stub the doc with frontmatter + section skeleton
Create docs/reference/retailers.md with frontmatter (title, description, status: active, tags: [reference, scraping, foundation]), section headers (Scope & method, Currently scraped, Prioritized next, Scraper roadmap, Open questions, See also), and the per-retailer template embedded as a comment for consistency.
Add a row in docs/reference/index.md linking to it (mirror the pattern of competitive-landscape.md and personas.md rows).
Verification: mkdocs build --strict from a clean state succeeds (no broken refs introduced).
Step 2: Currently-scraped retailers — fill 3 entries
PCAndParts — read src/scrapers/sites/pcandparts.ts for category URLs + WooCommerce/Flatsome confirmation; WebFetch https://pcandparts.com/product-category/cpu/ to get rough listing count + pagination model + check for anti-bot; cross-reference competitive-landscape §4.4 for UX context (5/10 era 2015-2018 Woo/Flatsome). Fill the template.
961Souq — read src/scrapers/sites/souq961.ts for selectors + Call-For-Price handling; WebFetch https://961souq.com/collections/cpus; cross-reference §4.4 (5/10 Shopify, "Call For Price" without inquiry mechanism). Fill the template.
Macrotronics — read src/scrapers/sites/macrotronics.ts for selectors; WebFetch https://www.macrotronics.net/collections/processors-cpu; cross-reference §4.4 (6.5/10 Shopify, includes 10% VAT in displayed prices). Fill the template.
Recommendation field for these three is already-on (continue) — keep maintained.
Verification: 3 entries filled, every template field has a value or an explicit "Unknown — flagged" note. No TODO/TBD strings.
Step 3: Prioritized-next — dispatch parallel subagents for 5 candidates
Use superpowers:dispatching-parallel-agents to launch 5 Explore-type agents in a single message. Per agent: fetch the homepage + the most relevant category page (CPU or "components") + look for anti-bot, pagination, "call for price", language switcher, affiliate program disclosure. Each agent returns a structured report keyed to the per-retailer template.
Targets, in priority order:
1. CompuOne — https://compuonelb.com (most cited in spec §6.2 Wave-1; PC parts focus)
2. Mojitech — https://mojitech.net (distributor + retail; signals catalog scale)
3. Ayoub Computers — https://ayoubcomputers.com (components + retail)
4. PCBuildingLeb — https://pcbuildingleb.com (custom builds + accessories — interesting niche; tests whether they expose a parts catalog or only services)
5. Multitech — https://multitech-lb.com (Apple + PC, retail/wholesale; scope-fit risk to flag)
Fallbacks if any of the above are unreachable: PcMacLB (pcmaclb.com), Gamma Computers (gammalb.com), Microcity (gomicrocity.com).
Verification: 5 subagent reports received; each report covers all template fields or explicitly says "Unable to verify because ".
Step 4: Compile prioritized-next entries (4-5 total)
Transcribe each subagent report into the doc, in priority order. Maintain the same template. Where an agent flagged "Unable to verify", state that explicitly in the doc — do not invent data (per ticket constraint).
For each, set Recommendation = H / M / L / Skip with rationale tied to: catalog scale × scraper feasibility × strategic fit (does it bring a category 961tech is weak in, e.g. peripherals/prebuilt?).
If total entries land at 7 (3 scraped + 4 next) or 8 (3 + 5), both are within the 6-8 ticket target; prefer 8 unless one candidate is clearly Skip.
Verification: every prioritized-next entry has a recommendation; no Recommendation field reads "TBD".
Step 5: Scraper roadmap section
Append a "Scraper roadmap" section: prioritized list of next 3-5 retailers to add (subset of prioritized-next entries with H or M recommendations), with one-paragraph rationale per pick, and one combined "what makes #20 trivial to plan now" close.
Call out any retailer flagged strategically-important-but-infeasible (heavy SPA, anti-bot, login-walled) at the top of this section, separate from the recommended-add list. Do not bury it.
Verification: at least 3 retailers listed; all picks are present in the audit table above; infeasibility callouts (if any) are separate from the recommended list.
Step 6: Open questions + cross-links
Add an "Open questions" subsection capturing anything that needs reviewer input or future research (e.g. Lebanese IG-only retailers per competitive-landscape §5.4 #4; brand-overlap with 961gamers.com).
Cross-link from competitive-landscape.md §5.4 #4 (was an open question deferred to #30) only as a See also line in retailers.md — do not edit competitive-landscape.md (it is status: active and out-of-scope here).
Verification: see-also block lists competitive-landscape, personas, writing-a-scraper guide.
Step 7: Gate — mkdocs build --strict
Run mkdocs build --strict from project root.
Expected: builds without errors. Any broken cross-link fails the build (the --strict flag).
If failure: fix the offending link/anchor and re-run. Do not commit until clean.
Step 8: Atomic commit
git add docs/reference/retailers.md docs/reference/index.md docs/plans/2026-04-28-issue-30-retailer-audit.md
Commit message: docs(reference): add Lebanese retailer audit and scraper roadmap (matches ticket spec exactly, no --no-verify, no push, no PR).
Verification: git status shows clean tree post-commit; git log -1 --stat shows only the three docs files touched.

Risks¶

Risk	Likelihood	Mitigation
WebFetch returns 403 / anti-bot for one of the candidate retailers	Medium	Subagent flags "anti-bot signals: 403 on direct fetch" — that is a finding (informs feasibility recommendation). Don't substitute for the data; state the gap.
Subagent invents data when a field isn't visible (e.g. SKU count, affiliate program)	Medium	Per-retailer template requires explicit "Unknown — flagged" for unverifiable fields. Reviewer sees what was guessed vs. observed.
Candidate retailer turns out to be heavy SPA (no useful HTML to scrape)	Low-medium	Ticket "Stop and ask if" clause #1 — pause and surface to MASTER for the strategic call (do we accept a heavier scraper).
One of the 5 candidates is dead/redirected	Low	Fallback list (PcMacLB, Gamma, Microcity) is pre-staged in Step 3.
Doc grows beyond 6-8 retailers and dilutes the recommendation signal	Low	Template enforces the Recommendation field; cap at 8. If a 9^th is interesting, add it to "Open questions" instead of the table.

Tests¶

No code changes → no test additions. The doc passes if mkdocs build --strict succeeds (Step 7 gate).

Doc updates¶

Reference: new file docs/reference/retailers.md
Reference index: row added in docs/reference/index.md linking to retailers.md
Architecture: not changed (ingest-pipeline.md already covers the per-retailer scraper pattern)
ADR: none — no new design decision; recommendations are advisory pending #20
Glossary: only if a new term is coined; not anticipated
Issue body: closing comment on #30 with commit SHA + summary

Rollback¶

git revert <sha> removes the doc. No code/state to undo.