Poll: Which global skill rating system is best ?

**WirryWoo** · 05-22-2021, 09:11 AM

Re: Poll: Which global skill rating system is best ?

To start some conversations. I can highlight some thoughts.

Primary reasons on why I prefer weighted averages:

� If Top X songs are set to define skill rating (after many discussions on what X should be) and if skill rating is consistently used as a comparative tool to measure performance between two files demanding different skills and requirements, then the X-th and (X+1)th songs should hold very similar weights and (X+1)th song is weighted at 0 by definition of Top X.

� Although weighted mechanism rewards you for "biases" encoded in the performance of files in your Top N more than the unweighted mechanism, a regulated (key word here) solution should still capture many benefits that the unweighted solution provides: specifically and most importantly, a stronger representation of lower ranked files in your Top X is needed to determine a user's skill rating.

� If we want to reward users for activity, why shouldn't the season's ratings be used there? Whether skill rating vs. seasons rating is assigned as the more "official" metric can be reserved as another conversation. Point is, there is a solution aimed to reward players who consistently play the game.

I get it. Our current weighted system does not do it well, but this doesn't necessarily translate to "any weighted solution cannot do that". It's a tradeoff between "improving representation of lower ranked files" and "rewarding performance for songs subjectively more challenging than what your current skill rating suggests", and in my opinion, that should be respected.

I've written a first iteration of what a weighted setting would look like. Attached is a Colab notebook for reference. There's a pandas dataframe containing the new projected rankings, the username, their projected weighted skill rating, and their current rank in game.

For fun: If you want to determine your projected skill rating under a weighted mechanism I designed, scroll up to the "Determine your skill rating" section, replace my username with yours, then scroll all the way to the top, and click on the play buttons for the first seven cells.

-

**Zageron** · 05-22-2021, 12:22 PM

Re: Poll: Which global skill rating system is best ?

As a victim of weighted averages, I stand in solidarity with Simple average of top X equivs.

**Matthia** · 05-22-2021, 12:23 PM

Re: Poll: Which global skill rating system is best ?

I prefer any system that will boost me to #1

**Gradiant** · 05-22-2021, 12:56 PM

Re: Poll: Which global skill rating system is best ?

Simple average better deals with issue of fluke scores on poorly rated files

**xXOpkillerXx** · 05-23-2021, 07:31 AM

Re: Poll: Which global skill rating system is best ?

More arguments from the weighted side please

**gold stinger** · 05-23-2021, 01:56 PM

Re: Poll: Which global skill rating system is best ?

huh.

**FlynnMac** · 05-23-2021, 03:23 PM

Re: Poll: Which global skill rating system is best ?

I guess I'll give my take on this

So I choose weighted mainly from my experience with other rhythm games where they have a more successful weighted system. It doesn't highlight your top play a large amount ahead of the rest not allowing entirely for outliers and it also uses a large size of files in order to give the most accurate rating possible. While simple average gives the average of all your top x ratings, weighted can still have your best plays give more of an impact than plays you aren't as happy with. Wirry's system had felt accurate to me because of the fact that the weights input on it were better than FFR's current weights. There are a lot of high level players who had lower ranks than they should have that got bumped up, and a lot of lower level players that got their ranks bumped down (me included). The ratings really felt like they defined who had better ranks over having an average rating system that could still have it's outliers. Outliers will not be fixed either way, but with the right weights, it could be fixed better than a simple average could do it.

**Zlyice** · 05-23-2021, 03:48 PM

Re: Poll: Which global skill rating system is best ?

There are two main reasons I'm in favor of a weighted average. One, as Flynn mentioned, is that a weighted average does a better job of giving resolution to a player's top level of play. The current system does give a pretty strong weight to a player's top score, but I see this more of an issue with the current weights as opposed to a weighted average in general. WirryWoo's calculation earlier in the thread seems pretty reasonable to me personally.

Secondly, an unweighted skill rating is only going to be as representative as the full scope of scores going into the calculation. If we're considering, for example, an unweighted average of 100 songs, this would require a player to play enough things for these 100 scores to be reasonably representative of their level of skill, which could take a considerable amount of time. There's a lot of potential for an unweighted average to disproportionately rank more active players ahead of players who play a bit less but are ultimately a little more skilled.

**xXOpkillerXx** · 05-23-2021, 05:13 PM

Re: Poll: Which global skill rating system is best ?

Alright, I'll try and make a structured statement.

First of all, there is a concern that many people pointed out, which is that any system would have outliers. While that is true, not all outliers are the same, and that should very much be considered. In all cases should we try to minimize the amount of outliers there are, but it can be very difficult to compare counts for different types of outliers. At that point, a bit of subjectivity is invovled and necessary.

Lets look at what types of outliers the two kinds of system generate:

1. Weighted avg outliers:
These are essentially any and all outliers that come from the fact that our difficulty judgement is inherently flawed, mixed with inevitable imbalance in players' skillsets. The two points in this can be further explained:

1.1. Difficulty
We (FFR) use a single number to represent chart difficulty. Obviously, this has a relatively high and non-negligible degree of subjectivity. Other games like Etterna have attempted to fix this flaw by splitting the difficulty in distinct skills, kind of forcing axioms for what defines difficulty at its core. This method can generally help distinguish between files that are well balanced vs the ones that focus on 1 or 2 specific skillsets throughout. However, we simply dont do that, either because it has its own flaws, or for various reasons unrelated to this topic. So, we have one single number representing the difficulty of each file, be it balanced or not.

1.2. Players skillsets
It's no surprise that each player has their own best and worst skills. Just like the files, some players' skillset are well balanced, while others' are more specific. Comparison of skill between two players can be argued, but my stance is that this statement should hold:

Player A's skillset: 3/3 for jacks, 2/3 for jumpstream, 1/3 for one-handed trills
Player B's skillset: 2/3 for jacks, 2/3 for jumpstream, 2/3 for one-handed trills
(The skills are just an example, but the numbers are important)

Player A and B are equal.

This has subjectivity in it, and I invite anyone to explain why they think player A should be considered the better player in this case. I personally believe that we shouldn't favor specific skill proficiency over general proficiency. Any person that agree with this statement should make sure their preferred system respects it.

1.3. The outliers
Well, in a weighted system, where a non-random sample X of files is used to output a single number representing global skill rating, the above statement can never hold. For any score x1 in X, there will always be a score x2 that is either favored or vice versa. This means that any weighted system (with X of set size !), by definition, will generate unfairness by favoring players with specific skillsets at any given level. When X is of variable size, it becomes -Incredibly- difficult to properly formalize the model, and therefore a lot of guessing is introduced. That is what WirryWoo's model's hyperparameters are. By tweaking these, we adjust X's shape depending on a player's scores, but we can no longer tell what is favored (skillset specificity vs varied skillset) nor to what degree it is. In my opinion, this is sub-optimal.

Again, this mostly revolves around the player comparison statement.

2. Simple avg outliers:
A simple average system also generates outliers. These are much more straightforward. In fact, such a system implies an important statement about skill rating:

Any player that has a rating representative of their actual skill level has optimally filled their top X scores.

This means that if X is of size 50, then a player should have 50 scores of their caliber to be properly ranked. Any player whose top 50 is not that will have their rating be lower than their true skill level.

The main downside to this is pretty simple too:
If, over time, too many players dont optimally fill their top X, then the rankings will be flawed. These are essentially the outliers of this type of system.

My primary argument (subjective) to support this downside is that I absolutely cannot understand why we should think that it's too much to ask from players who want to be ranked. Playing 50, or even 100 songs in your difficulty range should Not be troublesome; if you want to be properly ranked but cannot be bothered fulfill this pretty simple requirement, do you even really care to begin with ? Saying that an unweighted system "favors active players" is quite the overstatement in my opinion. You don't need to be that active of a player to fulfill the requirement.

3. Comparison of outliers
So we have defined the kind of outliers that each system will inevitably have. The main concern I have with saying that "outliers are outliers" is that they're actually drastically different conceptually.

The weighted models' outliers are unfair. Some players will always be favored no matter how weights are arranged. In a variable X size setting, the outliers may be reduced, but only by an undefined amount, and they become hard to model.
The unweighted model's outliers are fair. Any player can easily stop being an outlier by getting some more scores in their difficulty range.

Now obviously the amount of outliers in both cases will differ. Naturally, at the very beginning of a transition to an unweighted system, there would be many more of them. This means that a stabilization period would follow, during which the players will get more scores at their own pace to more optimally fill their top X. There will always be players who will not do it, and retired players may definitely not come back to adjust their scores for this. However, any change to the skill rating computation will require Some adjustment from the players to get a more optimal result, so keeping retired players' rankings as is is just not a possibility (although some systems may yield closer results, the point remains).

3.1. My take on the outliers
At the end of the day, I personally favor fairness over count when it comes to these outliers. That being said, I would totally be ok with moving back to a weighted system if, after an arbitrarily long stabilization period with an unweighted system, there is still not enough effort from the players to make their top X reflect their actual skill level. That would be quite sad, but FFR does have its periods of low activity, and too little of it would indeed mean a weighted system is required. I don't think we have too little currently, but that's mostly subjective and debatable.

4. Common arguments
Here are some arguments people usually make which I'd like to address:

4.1 Rewarding outstanding scores
There is this thought that a weighted system better rewards rare great scores players get every once in a while. While that is definitely true, it doesn't mean that unweighted doesn't reward it; it just does so to a lesser degree to respect the important statement made in 1.2! A great score is still rewarded as the top 1 score in the top X. A player with the same average skill as you will be ranked lower due to that new score you got. If they're not ranked lower despite that sick score you got, that means they're better than you on average, that is all.

4.2 What about the top players who wont have an optimal top X ?
Yes, if Myuka doesn't play more and a top 50 unweighted is implemented, they will have a skill rating far from representative. To be honest, I couldn't care less. There are countless players from all other rhythm games who we know could be in top spots on FFR. Granted they haven't played a single game, the fact that we Know they'd place around a certain spot is also applicable to our current top players who might never "fix" or "fill" their ranked scores. Yes, it looks funny to see Myuka be ranked 100th or whatever, but really that's a small argument to back unfairness in system outliers. Does this mean we reward activity ? No, not really. That means we enforce a (relatively small) minimum of activity over a player's whole "FFR career" in order to have a representative skill rating. Rewarding activity would be done with seasons, where the same concepts are applied to definite, repeating timeframes where stats are reset each iteration.

5. Conclusion
I hope this post clarifies why I believe an unweighted top X (of size 50 or 100) is preferable in our case. I am very aware of the flaws of such a system, but I definitely think they are significantly "better" flaws than a weighted system's flaws.

**trumaestro** · 05-23-2021, 07:43 PM

Re: Poll: Which global skill rating system is best ?

Spitballing: how about a bit of both sides?

Equal weights for top X scores. Decreasing weights to next Y scores.

I'm not math-y enough to work out whether that addresses any of the issues here, but it seems to me that combining sides here could help mitigate the downsides of each.

**xXOpkillerXx** · 05-23-2021, 07:56 PM

Re: Poll: Which global skill rating system is best ?

Originally posted by trumaestro

Spitballing: how about a bit of both sides?

Equal weights for top X scores. Decreasing weights to next Y scores.

I'm not math-y enough to work out whether that addresses any of the issues here, but it seems to me that combining sides here could help mitigate the downsides of each.

Not a bad take tbh. I'm not entirely sure yet what to think of it, but here's my quick thoughts.

A top X (in any system) should require significantly more scores than the current system in order to minimize the chance of skillset bias. In other words, the more files are taken into account (at equal weights, and to some extent obviously), the lower the probability of having one or two skills being overly representative of one's skill level. In extensive discussion on discord, the size of X has been mostly agreed to be between 30 and 100. I personally would be fine with 50 (as I suggest with seasons ratings too), but I'm also ok with 100 given the fact that it's not limited in time.

That being said, if you agree with my statements in the previous post, there should be at the very least a top 30 files with equal weights, after which it would start decaying until either 50 or 100 probably.

Again, I'm not sure what I think of it, but I'm probably more ok with it than not. Would be nice to hear from the people who were against unweighted.

**WirryWoo** · 05-23-2021, 09:58 PM

Re: Poll: Which global skill rating system is best ?