The Uselessness of Likert Scales in Rating Books

Likert Scale exampleBack in my usability days, I talked often about measurement error, the idea that something throws a monkey wrench into an otherwise careful attempt at accurate observation. Biases, for example, pop up in all sorts of interesting and confounding ways—I’ve seen users struggle with a site but say they loved their user experience. If not an issue with the Web design itself, users bring expectations with them as they use Web sites; one man I interviewed for the Social Security Administration consistently used an interface wrong because he didn’t personally identify himself as disabled, even though it took him 10 minutes to cross the room with his walker before taking his seat.

Usability scientists who aim to measure the success and efficiency of online systems have created an arsenal of tools for gathering more accurate information that can stem the effect of whatever measurement error is in play. One of those tools is a Likert Scale. We’re all familiar with them, those are the “rate this from 1 to 5, 1 being least and 5 being most,” items that float around opinion surveys and rating systems like Yelp. In truth, a scale can go from 1 to 3, 1 to 4, or 1 to 10, or whatever the designer thinks the range should cover.

But Likert Scales are notorious in the world of usability data collection, because very few people design them correctly, and very few respondents react to them appropriately. Problem number 1 with the scales is the discrete distance between each option: the scale demands that the difference between 1 and 2 is equivalent to the distance between 2 and 3. But emotionally, if we are judging our own satisfaction with something, can we parse out our feelings that way? Is happiness always even across a continuum? In terms of satisfaction, there is a lot of evidence that scores tend to drift toward the extremes of the scale, no matter how many markers there are in between. And for items that aren’t controversial, many people will select the middlemost answer. Some scale designers set up even-numbered scales to eliminate the lazy neutral response, but that doesn’t address the problem of pole attraction for respondents.

classic pain scale with facesAnother issue with Likert Scales stems from the individual differences among people making selections. When a nurse asks us, “What is your pain level,” and points to the familiar happy or sad face scale, what is a 3 for me may be a 6 for someone else. This particular scale is also problematic for the fact that while most of us no what the “no pain” rank feels like, we have very different (or no experience) with the “worst possible pain” point on the scale. Thus that range of 1 to 10 can be markedly different for different people. Perhaps the medical practitioner is only looking to see how someone feels generally, but this particular problem persists in other applications of the scale.

Which brings me to the now-standard 5-point scale for rating books. Amazon, Goodreads, Barnes & Nobel all have a kind of satisfaction rating available so that readers can make their assessment quickly, and this widget is typically separate from a text box where a written review would go. In granting access to such quick or snap judgments, one would think that users would be careful about where on the scale they made their mark. Instead, most responses are selected in the 4 and 5 slots, a strong preponderance for the top of the scale. Here are some examples I pulled from Amazon and Goodreads:

  • The Mill River Recluse, by Darcy Chan—4-star average, 547 responses: 292 5-stars,131 4-stars, 54 3-stars, 47 2-stars, 51 1-star on Amazon
  • One September Morning, by Rosalind Noonan—4.5-star average, 7 responses: 5 5-stars, 2 4-stars on Amazon
  • The Anubis Gates, by Brian Powers—4-star average, 2,669 responses: 1022 5-stars, 927 4-stars, 525 3-stars, 140 2-stars, 49 1-star on Goodreads (so 73% of the scores were in the 4-5 star marks)
  • Falling for Me, by Anna David—4-star average, 41 responses: 14 5-stars, 15 4-stars, 11 3-stars, 1 2-stars on Goodreads

First, it seems to me that there are different readers out there—readers who like to rank books they’ve read, and casual rankers who will do it if the scale is presented to them. Obviously more popular books generate more responses, but it would appear that there is often a bias toward positive ranking over neutral or negative ranking. I will wonder out loud if the active book rankers, the first group, have a different style of ranking; that is, are people who look forward to rating books more inclined to rate in the middle than the casual reader who sometimes rates books?

That said, when a poor book hits the scene readers will respond with very negative numbers. And all of the history on Amazon of fake ratings going up for some titles can’t mask a very crappy book, as the 1- and 2-star ratings will start to climb despite an author or publisher’s attempts to fluff up the average rating.

All of this leads me to thinking that the scale ratings scheme just doesn’t work well for books. On the one hand, some people try to manipulate the numbers and results, and on the other, measurement error seems to be firmly entrenched in this design of recommending or rating titles. Even just splitting out that distinction: “Did you like this book” versus “Would you recommend this book” may garner different results.

What I would rather see is a couple of yes/no widgets like those above, with a short box for users to say why or why not. I can understand the impetus behind the ratings game, but I’d rather use more traditional sales markers and qualitative-based reviews from readers to gauge a book’s interestingness or quality. And I have high hopes that someday, others will agree with me.

Tags: , , , , ,

Categories: Pop Culture, Writing


Subscribe to our RSS feed and social profiles to receive updates.

3 Comments on “The Uselessness of Likert Scales in Rating Books”

  1. November 7, 2011 at 8:23 pm #

    The problem inherent with the Amazon or Goodreads reviews is that the vast majority of the reviews are going to come from people who really liked the books and want to suggest that more people read them. I would imagine that most people do not post reviews for every book that they read (I post them all on my blog, but not on Amazon or Goodreads).

    Generally if you’re looking at Amazon for ideas of books to read I would pay more attention to the negative reviews than the positive reviews. If a book has mostly positive reviews that isn’t necessarily something that will push me towards that book if I was on the fence about buying it. (I’ve seen books on Amazon that had a 4 or 4 1/2 overall rating that I hated.) On the other hand, if I was on the fence about buying a book and it had mostly low ratings on Amazon that would probably be enough to keep me away from the book.

  2. evmaroon
    November 7, 2011 at 8:54 pm #

    Well, I suppose books with more reviews may be somewhat more helpful in that there ought to be more negative reviews out there, but books are so subjectively experienced that I’d rather just see whether a reader liked it or not, rather than trying to parse out what a rating vote means. Like you say, books I’ll hate may have high scores, or vice versa. There are helpful ways of structuring data on Web sites, and sadly, unhelpful ways. I put Likert Scales in the latter camp when it comes to reviewing novels.

  3. Justa Notha
    November 26, 2011 at 5:53 am #

    I find it hilarious that I was asked to rate your post at the end! I had the same problem trying to find an app to track my moods. Does dysphoric count as bad? What about crampy? The things that I wanted to measure…all the gray areas between depressed an manic…were lost amid smiley/frowney faces!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: