Methodology
This page explains how RedDB turns raw Reddit data into the product rankings, sentiment scores, and feature breakdowns you see on the site.
1. Data collection
We search Reddit for product mentions across relevant subreddits. For a category like headphones, that includes communities like r/headphones, r/audiophile, r/HeadphoneAdvice, and dozens of other forums where people discuss audio gear. We use Reddit's search capabilities to find posts and comments that mention specific products, then verify each mention to ensure it's genuinely about the product in question rather than a passing reference.
Our database currently contains over 300,000 verified mentions across more than 24,000 subreddits. We periodically refresh this data to capture new discussions.
2. Relevance and opinion filtering
Not every mention of a product is useful. A comment saying "I just ordered the Sony WH-1000XM5" tells us nothing about the product's quality. We use large language models to classify each mention on two dimensions: whether it's relevant to the product, and whether it contains an actual opinion. Only mentions that pass both checks make it into the analysis.
3. Sentiment analysis
Each relevant, opinionated mention is scored for sentiment. Rather than a simple positive/negative binary, we analyze the nuance of each comment: what specific aspects are praised or criticized, how strongly the opinion is expressed, and what context surrounds it. The sentiment score for each mention feeds into the overall product sentiment breakdown you see on product pages (the positive/neutral/negative bar).
4. Feature extraction
Beyond overall sentiment, we extract which specific features people discuss. If a headphone review mentions noise cancellation positively but criticizes comfort, those are recorded as separate feature-level data points. The features you see on product pages (like "sound quality," "comfort," or "battery life") are surfaced from this extraction, ranked by how frequently they're discussed and the sentiment direction.
5. Scoring and ranking
Product scores combine multiple signals: the ratio of positive to negative mentions, the total volume of discussion (more mentions means higher confidence), and recency. A product with 50 positive mentions out of 60 total will score higher than one with 5 out of 6, even though the percentages are similar — more data points mean a more reliable signal.
Rankings within each category are derived directly from these scores. We do not manually adjust rankings, accept paid placements, or boost products based on affiliate revenue.
Category rankings only include products we currently consider in stock. Products that are temporarily unavailable can still have product pages and appear in comparisons, but they are excluded from ranked lists until they are back in stock.
6. Limitations
Reddit's user base skews toward specific demographics, which means some product categories may have biases that don't reflect the general population. Budget products may receive less discussion than enthusiast-grade gear. We show you the mention count on every product page so you can judge the sample size yourself.
Sentiment analysis, even with modern LLMs, isn't perfect. Sarcasm, context-dependent opinions, and multi-product comparisons can sometimes be misclassified. We surface the original Reddit mentions alongside our analysis so you can always read the source and form your own judgment.
7. Updates
Data is refreshed periodically as new Reddit discussions are collected. Product prices are updated from Amazon. Scores and rankings are recomputed after each data refresh. This page will be updated if we make significant changes to our methodology.