Scoring Data in Brutal Difficulty Range

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Matthia
    🍍Pineapple Man🍍
    FFR Simfile Author
    • Nov 2017
    • 511

    #16
    Re: Scoring Data in Brutal Difficulty Range

    Originally posted by PrawnSkunk
    Looked back at the skill rating thread (the one that contains the AAA Equivalency formula: http://www.flashflashrevolution.com/...d.php?t=140927),
    we can clearly see how 1-good-input AAA equivalency decay diminishes as the difficulty of a file inflates. (Disregard to the fact that the graphs are a little inaccurate today due to some occasional adjustments over the years)

    I spent time iso-ing the last ten notes of 100+ files and purposely hitting blackflags to absorb more data on how that one good affected the equivalency score relative to the varying difficulties of the files
    Code:
    100 - First good costs 0.42 equivalency
    101 - 0.42
    102 - 0.41
    103 - 0.41
    104 - 0.40
    105 - 0.39
    106 - 0.39
    107 - 0.38
    108 - 0.38
    109 - 0.37
    110 - 0.37
    *I have already tested to see whether or not dropping more goods produces different change of equivalency loss, and it is quite linear*

    The idea of punishing less as difficulty increases is fine, not much of the concern here. Maybe (if possible) rework the AAA equivalency formula (or perhaps even better just implement a new method that will carry on after an output from the current formula was generated) in a way that robs more out of your equivalency score for the first few goods of a 100+ file, and then adjust to the normal speed of -0.xx per raw good you take. AAA'ing something should be more rewarding honestly:


    You Universe

    Difficulty= 104

    {PA} - {Old Equivalency Score} {Change}--> {New Equivalency Score} {Change}
    AAA - 104.00 N/A--> 104.00 N/A
    1g - 103.60 -0.40 --> 103.40 -0.60
    2g - 103.20 -0.40 --> 102.85 -0.55
    3g - 102.80 -0.40 --> 102.35 -0.50
    4g - 102.41 -0.39 --> 101.90 -0.45
    5g - 102.01 -0.40 --> 101.50 -0.40
    6g - 101.62 -0.39 --> 101.11 -0.39
    7g - 101.23 -0.39 --> 100.72 -0.39
    8g - 100.84 -0.39 --> 100.33 -0.39
    9g - 100.46 -0.38 --> 99.95 -0.38
    10g - 100.07 -0.39--> 99.56 -0.39

    Notice the change of new equivalency


    The example in this spoiler is there to give you an idea on how my solution might work out, the values I gave for the "new equivalencies" are somewhat arbitrary and shouldn't be taken seriously plz. This will surely take much time to put together, making sure everything is balanced well down to each single difficulty level.

    Originally posted by TheSaxRunner05
    The main trouble I see in increasing the AAA equivalency decay is it really advantages those that don't typically play for AAAs, for tournaments. You could end up with people with 30g on Tageri not even count as D5, but they clearly have the speed. I know that's difficult for any system to handle, but it may make it worse.
    This could be handled just by letting the system/method only affect D8 territory.
    Last edited by Matthia; 08-25-2019, 04:56 PM.



    Comment

    • One Winged Angel
      Anime Avatars ( ◜◡^)っ✂╰⋃╯
      FFR Simfile Author
      • Mar 2007
      • 10837

      #17
      Re: Scoring Data in Brutal Difficulty Range

      Did a quick midnight skim, lots of good points being made and I appreciate the discussion. Posting to remind myself to respond in greater detail tomorrow because there's several sentiments I agree with while others would compromise some direct comparisons between charts under the current skill rating system.

      Honestly this probably mostly boils down to adjusting the scaling for certain edge cases.


      Originally posted by ilikexd
      i want to be cucked by cirno

      Comment

      • Untimely Friction
        D6 Challeneged
        • Aug 2012
        • 1267

        #18
        Re: Scoring Data in Brutal Difficulty Range

        nerf milk tei its too easy to get 20g on

        Edit: Reflecting on this and Halogens mention of it being the difficulty of the entry level for D8 or so, I wonder at if d8's can AAA this. That's prolly important since the whole point was to guage AAA equivalency, and therefor moooost d8's should competently fesibly AAA Milk tei, and if they dont, somethings weird, and considering AAA's past 100 just isnt reasonable? Or something...
        Last edited by Untimely Friction; 08-26-2019, 01:27 AM.

        Comment

        • Dynam0
          The Dominator
          • Sep 2005
          • 8987

          #19
          Re: Scoring Data in Brutal Difficulty Range

          I propose difficulty rating be composed of the sum of two factors:

          A = General Difficulty Factor
          B = Upper Tier Scoring Factor

          Total AAA Equivalency = A + B

          Yes, yes I know this means we need two separate difficulty values for each chart now, how annoying! But the idea is that for charts with a consistent difficulty, B would tend to 0 and for charts like Husigi it would be on the higher end.

          Now for calculating the skill points, I have messed with this in excel a bit and this is where I am at now:

          Total Skill points = Ax + B^y

          where x = total raw score / max raw score, and;
          where y = 1/(effective goods + 1)


          Did some fiddling in excel with those and the effect in principle seems promising, but it needs more fine tuning.

          For Husigi:

          Using A = 85 and B = 16

          Example 1 - Raw Score (0/0/0/0) - Total Skill Points: 101
          Example 2 - Raw Score (0/0/0/1) - Total Skill Points: 95.07
          Example 3 - Raw Score (1/0/0/1) - Total Skill Points: 88.50
          Example 4 - Raw Score (5/1/0/2) - Total Skill Points: 86.23
          Example 5 - Raw Score (45/12/4/16) - Total Skill Points: 84.15

          Someone wanna play around with that idea some more? For instance I find the decrease in the "B" component is a bit too punishing in the above example while the decreases in the "A" component are not harsh enough.
          Last edited by Dynam0; 08-26-2019, 07:49 AM.

          Comment

          • One Winged Angel
            Anime Avatars ( ◜◡^)っ✂╰⋃╯
            FFR Simfile Author
            • Mar 2007
            • 10837

            #20
            Re: Scoring Data in Brutal Difficulty Range

            Ok, a few thoughts:

            FFR historically has always been a game favoring accuracy. There aren't other letter grades players are aiming for like AA which requires meeting a specific percentage threshold or other mechanics similar to that, and as such players will gravitate towards playing charts they can score relatively well on. Even with incredibly difficult charts being added now, outside of vrofl and the very newly released HQR, every chart has at least been SDG'd (and HQR will get there by round's end). By tourney's end I'm projecting around 5 charts at maximum that won't have SDGs as top scores, so the difficulty ceiling will remain quite low comparatively to what's found in Etterna and played by top tiers.

            I don't have a major issue with AAA difficulty being largely represented in a chart's difficulty because I feel nerfing some charts that are extremely hard to AAA but understandable at a much lower skill level penalizes what a AAA is worth on those charts.

            Take Husigi vs. Jamais Vu. JV at time of release was viewed as an entry level 13 and it likely remains as such to this day. Most players are going to experience a slingshot effect of sorts when scoring on both charts. Players approaching the D6 level might be able to score in the 20-30g range on Husigi without even being able to read Jamais Vu. Eventually they'll hit a higher reading/speed/stamina threshold that allows them to read JV, and now they're reaching that same scoring range there. Meanwhile they've only made marginal gains on Husigi because the relevant parts of the chart represented in its difficulty require technical consistency and control not demonstrated as effectively as raw speed at higher levels. Players will continue making greater gains in speed and stamina and get to a point where they're consistently near AAA'ing JV on any playthrough. It's at this point in skill where I'm willing to bet a majority of players will say a Husigi AAA is much more impressive than on a chart like JV, and certainly more impressive than anything under 100.

            The entry requirement for a top 20 rank on Husigi, a chart that's been out for 8 years, is 6g. Rave 7, also viewed as an entry level 13 released in the same tournament as Husigi, requires 1-0-0-1. Miku at 102 even has a stricter top 20 entry requirement at 4g. Dropping Husigi's difficulty to the FGO range penalizes the rating value that should rightfully be gained by any player capable of perfecting a chart like that, as well as any other charts that exhibit similar scoring trends. I think it's best to identify these files and create separate equivalency decay formulas for them instead of outright nerfing the difficulties. I feel only a small minority of top players would value a Husigi AAA consistently less than the charts sitting in the 100s bracket.

            The extreme high end (105+) can be adjusted to better account for physical capabilities required for playing these charts. Again, these are charts that even the top players (less two or three atm) will not be approaching AAAs on, and as such should have separate equivalency decays. Up until this point there have only been a small handful of charts that any D7+ player might not be able to keep up with, and they're all 100+ in rating already anyways (other offenders in the 90s range might be charts like Serious Shit or thinking of you but....that's basically it?). RATO was placed where it was in difficulty without having anything to compare with for a decade. I have no qualms with nerfing its difficulty on the basis that it's much easier to make it through with a semi-respectable score compared to Wanderflux or HQR (and I'm sure more charts to come throughout the tourney). Establishing a separate decay formula for charts of this difficulty needs to happen. It's fair to say 20g on Wanderflux is deserving of at least 100+ equivalency, but you'd need to shoot its rating up a few points further to hit that point with current scaling.

            More thoughts later Maggles is annoyed I'm ignoring her.
            Last edited by One Winged Angel; 08-26-2019, 12:35 PM.


            Originally posted by ilikexd
            i want to be cucked by cirno

            Comment

            • xXOpkillerXx
              Forever OP
              FFR Simfile Author
              • Dec 2008
              • 4207

              #21
              Re: Scoring Data in Brutal Difficulty Range

              Your idea of variable decay is basically something I brought up in chitchat discord but less extreme; the ideal formula would be one that varies for each chart.

              Let f(c, s) = a
              Where c is a chart, s is the raw goods score and a is the resulting AAA equivalency.
              This gives much flexibility over various chart structures and can be implemented without too much trouble. Of course, that requires some computed difficulty, hence my work on that some time ago. I'll see if I can extract some useful stats for this thread this week.

              I dont think the current system is "fixable" without big changes, it simply does not account for chart structure, which it should.

              Comment

              • RenegadeLucien
                FFR Veteran
                • Jan 2016
                • 283

                #22
                Re: Scoring Data in Brutal Difficulty Range

                Originally posted by xXOpkillerXx
                I dont think the current system is "fixable" without big changes, it simply does not account for chart structure, which it should.
                Well, the current system relies entirely on the manually-assigned difficulty rating to sum up all the factors like chart structure. To do anything different, we'd basically need an automatic difficulty calculator like Etterna's.


                Comment

                • xXOpkillerXx
                  Forever OP
                  FFR Simfile Author
                  • Dec 2008
                  • 4207

                  #23
                  Re: Scoring Data in Brutal Difficulty Range

                  Originally posted by RenegadeLucien
                  Well, the current system relies entirely on the manually-assigned difficulty rating to sum up all the factors like chart structure. To do anything different, we'd basically need an automatic difficulty calculator like Etterna's.
                  Well, yes.

                  Although Rob's idea would be a patch for the apparent edge cases, at some point if we want accurate difficulty we need better maths, that's just how it is. I got so much shit for trying to make a calc, yet it's clearly the way to go, especially for a game like this with so many charts.

                  Edit:
                  The manual difficulty can only be the AAA difficulty. The AAA equivalency formula computes a score for any raw goods count. It is evident that the mapping makes 0 sense.
                  Last edited by xXOpkillerXx; 08-26-2019, 09:42 PM.

                  Comment

                  • One Winged Angel
                    Anime Avatars ( ◜◡^)っ✂╰⋃╯
                    FFR Simfile Author
                    • Mar 2007
                    • 10837

                    #24
                    Re: Scoring Data in Brutal Difficulty Range

                    Originally posted by RenegadeLucien
                    Well, the current system relies entirely on the manually-assigned difficulty rating to sum up all the factors like chart structure. To do anything different, we'd basically need an automatic difficulty calculator like Etterna's.
                    The current system needed more information to work anywhere near accurately to begin with. Charts have a single number attached to them to approximate the value of a near perfect or perfect score, but the growth rates approaching that point can be widely different depending on chart structure and skills tested, and this had been publically acknowledged. The current system erroneously slapped on identical decay formulas to every chart within a given subtier using a few inputs and comparisons from surveying the event team staff and making modifications as necessary until it looked 'nice'. Much more work needed to be done to capture accurate equivalencies and I voiced that prior to the system's release but no one seemed to care until the problem became much more evident several years later.

                    Scores on RATO/DP from D6/D7 players were spitting out equivalencies in the FMO or lower range for years but no one paid any mind because it was just a couple charts and sure whatever that's fine I guess. But now that there's gonna be more and an entire division is going to be reliant on most of that range for what comprises their skill rating, that definitely needs to change.
                    Last edited by One Winged Angel; 08-26-2019, 09:43 PM.


                    Originally posted by ilikexd
                    i want to be cucked by cirno

                    Comment

                    • RenegadeLucien
                      FFR Veteran
                      • Jan 2016
                      • 283

                      #25
                      Re: Scoring Data in Brutal Difficulty Range

                      Originally posted by xXOpkillerXx
                      I got so much shit for trying to make a calc.
                      Who gave you shit aside from Mina? I don't think anyone here would actively oppose you or anyone else trying to make a difficulty calc.

                      Originally posted by One Winged Angel
                      Scores on RATO/DP from D6/D7 players were spitting out equivalencies in the FMO or lower range for years but no one paid any mind because it was just a couple charts and sure whatever that's fine I guess.
                      It's been stated over and over and over again by so many people that the current system's accuracy drops very fast as you go higher up in good count; I don't think this was ever an issue unique to DP/RATO or even high difficulty songs in general.


                      Comment

                      • One Winged Angel
                        Anime Avatars ( ◜◡^)っ✂╰⋃╯
                        FFR Simfile Author
                        • Mar 2007
                        • 10837

                        #26
                        Re: Scoring Data in Brutal Difficulty Range

                        Originally posted by RenegadeLucien
                        It's been stated over and over and over again by so many people that the current system's accuracy drops very fast as you go higher up in good count; I don't think this was ever an issue unique to DP/RATO or even high difficulty songs in general.
                        And yet the system remains the same. I'm aware it's evident elsewhere, it was just most glaring on those charts.

                        I don't see an issue with trying to hammer out any and all issues in an effort to create a more accurate system. I feel like you take these comments as personal attacks and are quick to displace blame elsewhere, such as on the difficulties having needed to account for this when this was a system assumptive of numerous chart qualities being meticulously considered and represented by a single number so as to treat them identically when extrapolating a score's worth.


                        Originally posted by ilikexd
                        i want to be cucked by cirno

                        Comment

                        • RenegadeLucien
                          FFR Veteran
                          • Jan 2016
                          • 283

                          #27
                          Re: Scoring Data in Brutal Difficulty Range

                          I'm not sure where I gave off the impression that I'm taking any of this personally, but if that's what you're getting, that wasn't my intention, I'm sorry for giving off that impression. The current system isn't even mine.

                          I'm pretty sure we're on the same side here. The system needs to change. My point is that the only way we're really going to get something that's truly accurate is with a calculator. Yeah, we could have some sort of variable decay rating for each song that tunes the base skill rating formula for it, but to do that we'd need a calculator anyway, unless someone wants to manually go through all 2000+ songs and give all of them another number.


                          Comment

                          • xXOpkillerXx
                            Forever OP
                            FFR Simfile Author
                            • Dec 2008
                            • 4207

                            #28
                            Re: Scoring Data in Brutal Difficulty Range

                            Originally posted by RenegadeLucien
                            Who gave you shit aside from Mina? I don't think anyone here would actively oppose you or anyone else trying to make a difficulty calc.
                            Mina was the most direct about it but many in discord would keep saying that I'm wasting my time and be pretty passive-aggressive. Mostly people just bandwaggoning with Mina. I admit that I have a problem with people who say something is not possible without being able to provide a thorough proof of their claim. Many just spew some pseudo maths arguments but cant get into details.

                            Anyway, if interest for a calc goes up for real now, I might have some motivation to help.
                            Last edited by xXOpkillerXx; 08-27-2019, 06:24 AM.

                            Comment

                            • xXOpkillerXx
                              Forever OP
                              FFR Simfile Author
                              • Dec 2008
                              • 4207

                              #29
                              Re: Scoring Data in Brutal Difficulty Range

                              Here's my take on the difficulty factors of individual notes:

                              -Local one-hand complexity: how difficult it is to hit that note given the past and future X notes on the same hand (future notes are necessary to account for readability). That is something I havent finalized, but generally the difficulty goes up first with the spacing of notes (the less frames between the notes, the harder it is in a non-linear fashion so that 1-framers are much harder than 2-framers but 30 and 31 frames are pretty similar), then by transition (at the same speed, a jump to a single note is always harder than a minijack or a jumpjack, and single-to-single like 12 or 34 or 21 or 43 have a special weight for being easily hit as a jump or not). Some time-based gaussian window over each note gave decent results with a window of about 1 second (30 frames) or less on each side and a low std dev (< 1.0).

                              -Global 2-hands complexity: a distribution of the two 1-hand complexities at each timestep. For example, a very hard section on one hand with a very simple one on the other hand could be easier than medium difficulty on both hands at same time. This would need refinement for polys and a more well-defined explanation (with general and edge cases).

                              -Note time: just a factor of where the note is in time. This has to be picked/formulated so that a note after 5 minutes with low complexity cannot be harder than a note 1 minute in with high complexity. It is easily defined once the complexities are defined. Accounts for focus loss and partly for stamina.

                              -Note stamina: a large time-based past-only window over the aggregated factors above. Essentially accounts for breaks in a song; it's easier to hit a hard section after a break than in the middle of some stream or whatever.


                              This gives a difficulty number to each note of a file. It then becomes possible to compute a different AAA equiv formula for each file. The overall difficulty of a file would then not be a single number, but rather a distribution over raw goods count. Nothing forbids us to compute and show difficulty for a specific count (AAA difficulty, 10g difficulty, 20g difficulty, etc).

                              I have a pretty good setup to compute these already, so I'm saying it here to gather more opinions on the aggregation part and various factors (gaussians parameters, complexity, etc).

                              @rob if you prefer this to be in another thread let me know
                              Last edited by xXOpkillerXx; 08-27-2019, 07:10 AM.

                              Comment

                              • Dynam0
                                The Dominator
                                • Sep 2005
                                • 8987

                                #30
                                Re: Scoring Data in Brutal Difficulty Range

                                My main concern with that approach is pattern manipulation and stamina. Getting those right would be a tough ask imo. I still think it is far less tedious to have subjective difficulty assignments and as Rob said we just need to get the decay part correct. It's not incredibly far off at this point.

                                Comment

                                Working...