Listing lifecycle¶
What this answers: what states does a single retailer Listing move through, from first scrape to disappearance?
State machine¶
stateDiagram-v2
[*] --> Discovered : first scrape finds URL
Discovered --> Unmatched : matcher couldn't resolve
Discovered --> Matched : matched to canonical Product
Unmatched --> Matched : later run matches (or operator override)
Matched --> Active : has price + in stock
Matched --> QuoteOnly : retailer lists no public price
Active --> OutOfStock : stock indicator flips
OutOfStock --> Active : retailer restocks
Active --> QuoteOnly : retailer removes price
QuoteOnly --> Active : retailer adds price
Active --> Stale : not seen in N days
OutOfStock --> Stale : not seen in N days
QuoteOnly --> Stale : not seen in N days
Stale --> Active : reappears
Stale --> Removed : URL 404 confirmed
Removed --> [*]
States explained¶
| State | Meaning | Driven by |
|---|---|---|
Discovered |
Scraper just inserted the row. No match attempted yet. | Scraper run |
Unmatched |
Matcher ran but couldn't resolve to a canonical Product. productId is null. |
Matcher |
Matched |
Linked to a Product. productId set, matchConfidence > threshold. |
Matcher |
Active |
Has a priceUsd and the latest ListingPrice.inStock = true. |
Latest ListingPrice row |
QuoteOnly |
priceUsd is null on the latest ListingPrice. Common on 961Souq. See #3. |
Latest ListingPrice row |
OutOfStock |
inStock = false on latest ListingPrice. |
Latest ListingPrice row |
Stale |
lastSeenAt is more than N days ago. Data is showing but flagged old. |
Cron / staleness guard |
Removed |
Retailer URL no longer returns a product. Soft-deleted to preserve Click history integrity. |
Scraper 404 detection |
Implementation notes¶
- The
Listingrow'smatchStatusfield captures Discovered/Unmatched/Matched. The Active/QuoteOnly/OutOfStock distinctions are derived from the latestListingPricerather than stored on the Listing itself — this keeps the price history append-only. - Stale is a guard, not a stored state. It's a query-time computation: "Active but
lastSeenAt < now() - 7 days." If the next scraper run sees the URL again,lastSeenAtupdates and the Listing is no longer stale. - Removed is a soft delete (shipped 2026-04-28 per #148). The
Listingrow carriesdeletedAt: DateTime?. A scrape run finishes by sweepingListingrows whoselastSeenAtis older than the configurableSOFT_DELETE_WINDOW_HOURS(48h default) and setsdeletedAt = now(). Subsequent scrapes that re-discover the URL cleardeletedAt = null(un-soft-delete). All read paths in/products,/products/[slug],/build/choose/[category],PriceTicker, andFooterfilterdeletedAt: nullby default.Clickhistory (#17affiliate reconciliation) andListingPricehistory (#8 price history) survive the soft-delete because we never hard-delete theListingrow.
Match rate concern¶
M1 overall match rate is ~13.5%, much higher for CPU/GPU (39-40%) thanks to category-specific matchers. The lower categories — Cooler 6.4%, RAM 9.8%, Storage 3.3%, PSU 3.9% — sit in Unmatched indefinitely.
The fix is LLM-assisted spec extraction (#21). Until then, those categories accumulate Unmatched Listings that never transition further.
See also Ingest pipeline for the matcher's role in the scrape flow.