This is from the Game Changer podcast:Behind the Stars: Uncovering the Biases in Online Reviews | with Tommaso Bondi. Link to the podcast website.

The podcast features Tommaso Bondi (link to his webpage) and his paper Alone, Together: A Model of Social (Mis)Learning from Consumer Reviews.

The problem roots in cultural markets, like movies, music, books where consumers discover and experience the product without knowing their exact utilities ex-ante. To decide which product to consume, we might utilitze reviews to gauge the unknown utility. So are these reviews reliable? At first sight, it seems that if there are sufficiently large number of reviews, the aggregate scores converges to a sound estimate of the real quality.

But, add in consumer’s self-selection thickens the plot. Assume consumers tastes are heterogeneous—that product match values are different for each consumer, and each product’s match value distribution might also be different. For example, consider movies. A niche movie would have a polarized distribution that some people would find an extremely interesting while others find nonsense; an Oscar winning classic would have a smoother distribution with a higher mean.

Let’s say we have these two movies and consumers will need to choose between them during Christmas based on reviews. What we would expect is that, first the niche movie will only attract people who know themselves that they are interested in this category—like perfect matches—and this will push reviews up. On the other hand for the Oscar winning movie, it gets really popular, starts attracting consumers for whom it is not a perfect match. And therefore reviews will naturally go down.

Overall, for Oscar, winning movies or national awards winning books we generally observed that reviews for all of these products go down right after the awards. And this is a classic case of expanded consumer pool. A lot of reviews coming in from a more general audience with a broader match than those niche products so that the review goes down at least relatively. At the end of the day, the popular product gets more reviews, but the niche product gets better match. The difference between the reviews are compressed as compared to the original difference between qualities of these products.

Tommaso Bondi’s paper solidified this intuition from a theoretical perspective. The kind of feedback loop between the nature of the reviews, the way consumer form belief about qualities from the reviews and consume, on with the feedback loop, eventually result in a biased dynamic of the reviews. The starkest result in the paper is that, if one takes two products of given quality difference, the difference between the average reviews will be lower than the difference in quality—even if we assume every individual consumers review is free of noise.

One future direction the author mentioned during the podcast is empirical verification. For now, there is theoretical result, there is a lot of anecdotes, but economists are still expecting more actual evidence. Indeed, in the age of big data we do have access to rating data sets that might be able to provide relevant complementary insights.

One more thinking that might be interesting: apart from consumer self selection, what if we take into consideration of consumers intention of giving out review reviews. To be more specific, people usually don’t comment on every movie they watched. They tend to rate a movie when it’s extremely good or it’s extremely bad. Looking from first sight, this might polarize instead of compress the review difference. Maybe a similar theoretical model can justify my hypothesis and it will be interesting to see how this two conflicting forces interplay to affect the final presentation of the review system.

Last note: in reality, consumers are strategic in interpreting the reviews. For example, when looking for decent GuangDong restaurants on DianPin (the Chinese version of yelp), I intentionally exclude restaurants above 4.5 stars—either the ratings are fake or the restaurants would possibly be too expensive.