Weighted raiting

Posted by Thomas on February 11th 2009 - Tags: ,

Every website that gives its users access to rate items by a scale comes across a choice in how to best handle the calculation of ratings. There are some pretty advanced algorithms used to handle this. From algorithms based on the Bayesian prinsiple or using social networking(pdf) as a base, like digg.com

The easiest choice is to calculate the average rating and leave it at that. This means that an item which has been rated once as 10 will be ranked higher then an item which has been rated over a thousand times, and has an average of 9.8. There is of course an upside and a downside to this. For a website with a lot of activity, items voted up like this get an opportunity to be viewed and if it isn’t up to standard, it will be rated down. Though, this system has its downsides.

Lets say a website with user created content like youtube.com had this system running. This would mean every user could vote their own (or get a friend to vote – just to preempt you guys) item as a 10 and be on the top of the top-rated list. The result would be that the top-rated list would be filled by bad quality videos, and the top-rated list wouldn’t be very useful.

To combat this, we have weighted rating. Its main purpose is to give items with a low number of ratings a handicap. How weighted rating is implemented varies a lot, but a good place to start is by using the average number of votes for all items as a base for the weighting.
Here is an example of a simple algorithm that can be used.

0<= weight >= 3
weight = AVG(votes) / COUNT(ratings)
rating =( SUM(ratings) / COUNT(ratings) ) – weight

The first line says that weight cannot be smaller then 0 and not bigger then 3. If we do not set this kind of a limit we would get some unexpected results. Lets say AVG(votes) is 30, the item has one rating of nine on a scale from one to ten. This would give us:

weight = 30 / 1 = 30
rating = 9 / 1 – 30 = -29

Though, there is a problem with that algorithm. New items are treated the same as very old items. The way I see it, old items with few ratings should be weighted down a lot, while new items, with few ratings, should not be rated down as much, as this would give them an unfairly bad start. So what we want can do is add a time based weight decreaser.

daysOfEffect = 180
rateOfDecrease = daysOfEffect / 2

0<= timeWeightDecreaser => 2
timeWeightDecreaser = 2 – ( daysOnSite / rateOfDecrease )

0<=weight=>3
weight = (AVG(votes) / COUNT(ratings)) – timeWeightDecreaser

1<=rating=>10
rating = (SUM(ratings) / COUNT(ratings)) – weight

This is the timebased weight decreasing algorithm

This is the timebased weight decreasing algorithm

daysOfEffect denotes how long an item should be considered new. A small period of time before the the time based weight decreaser kicks in can be a good idea. A visual on how this is done can be seen in the picture on the left. I used to grapher utility you get in OS X to create it. Never thought I’d get use for that :)

I also implemented a static function in php so you can see how this algorithm can be implemented in code.

<?php
  /**
  * Calculates weighted rating
  * Uses a time and vote amoun based algorithm
  *
  * @param $avg avarage number of votes for all items
  * @param $sum sum of ratings for item to be rated
  * @param $count number of ratings for item to be rated
  * @param $freePeriod number of days until the time based algorith will start having an effect
  * @param $daysOfEffect total number of days the time based algorithm will have an effect
  * @param $daysOnSite number of days the item has been on the site
  * @param $debug optional, default = false, set to true to see debug message
  * @return decimal weighted rating
  **/
  public static function CalculateWeightedRating($avg,
                                                 $sum,
                                                 $count,
                                                 $freePeriod,
                                                 $daysOfEffect,
                                                 $daysOnSite,
                                                 $debug = false )
  {
	  if( $count == 0) return 0;

	  $rating = $sum / $count;

	  if($count > $avg)
	  {
		  $weight = 0;
		  $timeWeightDecreaser = 0;
	  }
	  else
	  {
		  $weight = $avg / $count;

		  if( $weight > 3)
		  {
			    $weight = 3;
		  }

		  $rateOfDecrease = ($daysOfEffect - $freePeriod) / 2;
		  $timeWeightDecreaser = 2.3 - (($daysOnSite - $freePeriod) / $rateOfDecrease);

		  if( $timeWeightDecreaser > 2.3 ) $timeWeightDecreaser = 2.3;
		  if( $timeWeightDecreaser < 0 ) $timeWeightDecreaser = 0;

	  }

	  $weightedRating = $rating - ($weight - $timeWeightDecreaser);

	  if( $weightedRating < 1) $weightedRating = 1;
	  if( $weightedRating > 10) $weightedRating = 10;

    if( $debug == true)
    {
	    echo "<p>";
	    echo   "Average votes: " . $avg . "<br/>";
	    echo   "Sum rating: " . $sum . "<br/>";
	    echo   "Num votes: " . $count . "<br/>";
	    echo   "Days on site: " . $daysOnSite . "<br/>";
	    echo   "Rating: " . $rating . "<br/>";
	    echo   "Weight: " . $weight . "<br/>";
	    echo   "Time weight decrease: " . $timeWeightDecreaser . "<br/>";
	    echo   "Weighted Rating: " . $weightedRating;
	    echo "</p>";
    }

    return $weightedRating;
  }

One Response

Internet Banking Says:

Just killing some in between class time on Stumbleupon and I found your entry. Not normally what I prefer to read about, but it was absolutely worth my time. Thanks.

Leave a Reply