Well, the Hottest 100 happened yesterday, so now it’s time to evaluate how my spoilers list went:
- 147 songs were given a 1% or higher chance of appearing in the Hottest 100 by my bootstrap analysis. 98 of those 147 songs made the Hottest 100.
- 85 of the songs in the Spoiled Top 100 made the Hottest 100.
- The top 72 songs in the spoiled list made the Hottest 100.
- 6 of the 10 songs in the Spoiled Top 10 made the Hottest 100′s Top 10.
- #1 was correctly predicted.
Why wasn’t it more accurate, or as successful as last year?
- We don’t know how good the OCR model was at transcribing votes. We’ll need to do work fitting our OCR model to known transcriptions if we want to do this again next year.
- The sample size was much lower.
- Daft Punk fans don’t post to Instagram enough (inherent bias)?
- The bootstrap resampling technique did not account for taste: particular songs often being selected together.
I’ve got every confidence that this method will be viable for next year, especially since the results were a much less spectacular spoiler than 2012′s Warmest 100. Let’s see if we can make a better model for next year.