What does true Bayesian estimate mean in connection with the IMDb Top 250 ratings?
It is a useful weighting mechanism used by the Internet Movie Database (IMDb) to adjust a movie's rating score based on the number of votes it has received. Less votes carry less 'weight', so a movie's score based on few votes is more likely to be pushed toward a pre-determined benchmark figure. As the number of votes increase they begin to assert more of their own influence on the final score. Bayesian averages differ from other averages (e.g. arithmetic mean) in that you have to make some assumptions about the system beforehand. The average score across all movies rated on the IMDb is 6.7 (out of 10). This is the IMDb's pre-determined benchmark figure against which other movies are compared in the Bayesian estimate. IMDb also require a minimum of 1300 votes before a movie makes it to their Top 250. The IMDb didn't necessarily have to use the figures of 6.7 & 1300. They could easily have picked other starting points like 5.0 and 2000, for example, if they thought they had good reason. Now let's say 'only' 1350 people vote for a movie but give it a perfect score of 10. Using the IMDb methodology: Bayesian est. = (1350x10 + 1300x6.7) / (1350+1300) = 8.4 The initial score of 10 is pushed significantly downward toward the benchmark of 6.7 due to the relatively small number of votes. Now let's say 20,000 people vote for a movie and also give it a perfect score of 10: Bayesian est. = (20000x10 + 1300x6.7) / (20000+1300) = 9.8 The initial score is still pushed down, but not nearly as much due to the relatively high number of votes. The Bayesian estimate quite closely reflects the opinion of a large number of voters. That is, the opinion of 20,000 people carries more weight than 1350 people, using the Bayesian approach. A similar thing would happen if the score was low, except a relatively smaller number of votes would pull the Bayesian estimate up towards the overall average of 6.7 instead of down.