this post was submitted on
9 points (80% like it)
12 up votes 3 down votes
reddit is sponsored by The Darkness II
sponsor

reddit is a source for what's new and popular online. vote on links that you like or dislike and help decide what's popular, or submit your own!

all 30 comments

[–]raldi[A] 5 points6 points ago

sorry, this has been archived and can no longer be voted on

The fix I've been wanting to roll since forever is simply:

min(ups, downs)

It works surprisingly well.

[–]sunkid[S] 2 points3 points ago

sorry, this has been archived and can no longer be voted on

doesn't that make +100/-100 and +1000/-100 equivalent?

[–]raldi 2 points3 points ago* 

sorry, this has been archived and can no longer be voted on

Yes, by design. Doesn't yours rate them as equivalent, too?

In fact, it looks to me that your formula is simply equivalent to mine times two. Can you name a pair of up / down values where that's not the case?

Edit: graphs!

[–]sunkid[S] 2 points3 points ago

sorry, this has been archived and can no longer be voted on

Well, shit!

That said, all three algorithms don't work then! 1000/-100 isn't controversial to me (90% approval, +900 net votes), while 100/-100 is. Basically, it should be something that gets bigger as ups/downs approaches .5 and should be bigger for items with more total votes.

I've got to run right now but I'll think more about this!

[–]robosatan 5 points6 points ago* 

sorry, this has been archived and can no longer be voted on

How about ( min(x,y) / max(x,y) ) * (x+y).

The min/max part will make a fraction that is closer to 1 where the number of upvotes and downvotes are closest (1 being an equal number of ups and downs), multiplied by the total number of votes meaning it scales with the level of controversy involved.

Total votes 1100 = 1000 up and 100 down = 100/1000 * 1100 = 110 controversy points

Total votes 1100 = 300 up and 800 down = 300/800 * 1100 = 412.5 controversy points

Total votes 1100 = 550 up and 550 down = 550/550 * 1100 = 1100 controversy points

Total votes 2000 = 1000 up and 1000 down = 1000/1000 * 2000 = 2000 controversy points

[–][deleted] 4 points5 points ago* 

sorry, this has been archived and can no longer be voted on

I like this. However a super-comment (a comment with a lot of upvotes) can potentially bully out other more controversial comments (but with less votes) by the sheer number of votes.

Example: super comment 1100 = 1000 up 100 down => 110 controversy
small controversial comment 100 votes = 50 up 50 down => 100 controversy

Suggested improvement:
( min(x,y) / max(x,y) )alpha * (x+y)

alpha (> 1.0) is a parameter to adjust the trade-off between up/down ratio vs number of votes on the controversy score.

[–]robosatan 0 points1 point ago

sorry, this has been archived and can no longer be voted on

I've revised your idea and come up with the following:

( x + y ) alpha - ( min(x, y / max(x, y) ) )

Where alpha is the min/max quotient which you deam the borderline of controversey. I've found that an alpha between 0.5 to 0.8 gives the best balance between prioritising controversey (50 up 50 down > 50 up 49 down) while including an influence from the total number of votes cast (50 up 50 down > 40 up 40 down). Though it would be best to try it out with some real data to see what works best for reddit.

[–]sunkid[S] 0 points1 point ago

sorry, this has been archived and can no longer be voted on

I think this is the winner! I have edited the original post, have a look.

[–]raldi 0 points1 point ago

sorry, this has been archived and can no longer be voted on

Can you paste the source you used to generate the table?

[–]sunkid[S] 0 points1 point ago

sorry, this has been archived and can no longer be voted on

I saved it to google docs

[–]sunkid[S] 0 points1 point ago

sorry, this has been archived and can no longer be voted on

Thanks for helping to think this through. I have edited the original post and pointed out the problem with this algorithm.

[–][deleted] 1 point2 points ago

sorry, this has been archived and can no longer be voted on

Looks good. Since there are exponentials, divisions etc it may be worthwhile to implement a quantized lookup table based on up/down votes to help saving processor cycles.

[–]phire14 1 point2 points ago

sorry, this has been archived and can no longer be voted on

I think this works very nicely, and would like to see it used.

I'm not very familiar with the Reddit code but I assume all algorithms for post sorting also have some sort of an attenuator for age of post and maybe also a factor for the rate of votes (faster voting floats the popular topic to the top)?

[–]joelthelion 0 points1 point ago

sorry, this has been archived and can no longer be voted on

How about (u * d) / (u + d)?

In your examples, it gives the following results:

  • 91
  • 218
  • 275
  • 500

[–]MercurialMadnessMan 1 point2 points ago

sorry, this has been archived and can no longer be voted on

this one looks pretty good :)

[–]sunkid[S] 0 points1 point ago

sorry, this has been archived and can no longer be voted on

I think robosatan proposed the best algorithm to address the problems with the current sort by controversy. I edited my original post with a comparison table. What do you think?

[–]fazon 0 points1 point ago

sorry, this has been archived and can no longer be voted on

and sort by whatever has the total number of votes

[–]raldi 1 point2 points ago

sorry, this has been archived and can no longer be voted on

That's not a good sort -- something with 5000 ups and no downs would outrank something with 1000 ups and 1000 downs.

[–]fazon 0 points1 point ago

sorry, this has been archived and can no longer be voted on

I meant something like whatever has the most combined upvotes and downvotes within a certain range (like minus something percent of each other).

Ex: 1000/-9990 would be controversial but 10 000/-1000 and 1000/-7500 would not.

[–]raldi 1 point2 points ago

sorry, this has been archived and can no longer be voted on

Can you express that in an equation that looks like this?

controversy_score = ...

[–]fazon -1 points0 points ago

sorry, this has been archived and can no longer be voted on

I'm not exactly sure/good with this stuff but something like

score = (upvotes - downvotes) sorted by (upvotes + downvotes) limited to stories with a difference within 6%

For example, if something is 500/-450 the total votes are 950 and the difference is 50. 6% of the total (950) is 57. Since the difference is less than 6% of the total (57), it would be shown.

An example of something that wouldn't be shown is if something is 100/-60. The total votes here would be 160 and has a difference of 40. 6% of the total (160) is 9.6. Since the difference is more than 6% of the total (9.6), it wouldn't be shown.

I would probably tweak the 6% though to get the best results.

[–]fazon -4 points-3 points ago

sorry, this has been archived and can no longer be voted on

you want an sql statement?

[–]DEADB33F 0 points1 point ago

sorry, this has been archived and can no longer be voted on

With regard to your OP update, would it be possible to display the controversy scores as a percentage, or otherwise normalize them so it's easier to compare between the various methods.

[–]sunkid[S] 0 points1 point ago

sorry, this has been archived and can no longer be voted on

I posted the spreadsheet to google docs. Feel free to edit! The best comparison would really be a check with the intended sort order. I'll see if I can work this up.

[–]mariod505 -5 points-4 points ago

sorry, this has been archived and can no longer be voted on

The entire sorting scheme needs to be thrown out and redone. Top and Old, should be removed, while Best should introduce some New in order to keep fresh ideas being introduced. If you want controversial, just scroll down.

Today, the whole thing falls apart after the first 100 comment votes as only the original popular comments stand any chance of being read.