Judging a StepMania Score

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Dynam0
    The Dominator
    • Sep 2005
    • 8987

    #1

    Judging a StepMania Score

    As a bunch of you know already, I've taken it upon myself to manage and organize a list of world record scores for StepMania; predominantly for 4-key spread scores. Of course, the current system has a number of flaws, but I'd like to take one flaw and spin a train of thought around it:

    The world record tables only list the "best" score obtained on a particular file and difficulty setting.

    This doesn't seem like much of a problem when you consider that the goal of the tables is to list "World Records", but this is a huge flaw in that there may be some scores worth mentioning or even subjectively* better than the current world record but simply are not calculated as better. I believe that some type of leader board system much like FFR's high scores lists is a great idea to get rid of this. How can we make a comprehensive high scores list for StepMania in the best possible way?

    The ideal situation would be to have a leader board system much like StepmaniaOnline; the server will keep track of when arrows were hit and automatically recalculate scores to fit Judge 4 timing windows, regardless of what judge the player is using. These recalculated scores are recorded in the database and are organized into a list much like FFR's (I'm not gonna lie, this is probably the biggest plus FFR has and it is a damn big plus if you ask me). Not only is this important for eliminating cheaters, but it allows scores to be tracked in real time.

    This seems great if not for the fact that there are scoring discrepancies on StepmaniaOnline. There is some amount of server lag associated with playing online which can vary depending on the player. The assumption is that a player will be penalized with false perfects due to lag, so the server attempts to correct this problem by having a marvelous window of 0.22550 seconds instead of the normal 0.22500 seconds. This translates roughly to having a Judge offset of 3.999. Although this seems incredibly minor and we can be comfortable in saying that we are still comparing apples to apples (since this offset is applied to all players who play using SMO), it still doesn't account for the fact that server lag can affect players in different ways. Some players can get negative perfects while others may get negative marvelouses due to this issue, so in essence we are not comparing apples to apples. Do you see where I'm going with this??



    I was going to make a list of other flaws but they all seemed easily correctable and are little more than nuisances that require time and effort to work around. The problem that I wasn't able to solve was always to do with scoring in general:

    What makes score 'X' better than score 'Y'?

    Classically we associated a higher DP (Dance Point) value with a better score and this still holds true, however we've modified the weighting of judgements to fit what we subjectively (key word here) believe is accurate. The method for comparing StepMania scores eligible for world records uses the DP-Marv scale. This gives dance points according to this scheme:

    Marvelous: 10
    Perfect: 9
    Great: 5
    Good: 0
    Boo: -20
    Miss: -40
    O.K.: 30
    N.G.: 0

    Does everyone agree with how things are weighted? The majority might, but to say that someone who gets a AAA with 29 perfects is better than someone who got marvelous on the entire song but dropped a single hold is a little strange to me. So lets decide we all agree on an excellent new weighting scale for DP values given to judgements. Can we accurately tell which score is better? Well let's consider something else first...


    Players A and B each achieve a AAAA on Ageha using Judge 4.

    Since we judge a better score as one which has higher dance points, Players A and B both have identical scores on Ageha. I'm not so sure this is the case and we can prove this by introducing stricter timing windows. This situation now becomes:

    Player A achieves a AAA on Ageha with 40 perfects using Judge 7.
    Player B achieves a AAA on Ageha with 37 perfects using Judge 7.

    Well obviously Player B has the better score than Player A since his accuracy was better. But what if we tighten up the timing window even further? The situation now becomes:

    Player A achieves a AA on Ageha with 278 perfects and 40 greats using Judge 'X'.
    Player B achieves a AA on Ageha with 312 perfects and 37 greats using Judge 'X'.

    etc.



    tl;dr

    My question to you all is, at what point do timing windows become redundant and is there a way to actually ascertain what is the "best" StepMania score for a given note chart?
    Last edited by Dynam0; 10-7-2013, 05:55 PM.
  • redsea
    seek ye first the kingdom
    • Mar 2013
    • 181

    #2
    Re: Judging a StepMania Score

    Are some scores really so debatable that a stricter time window is needed to determine whether or not one is better than the other? All we really care about is the harder files anyways

    Comment

    • ReikonKeiri
      i wanna be ur pop star
      • Jun 2006
      • 2388

      #3
      Re: Judging a StepMania Score

      Dynam0 you can be my stepmania professor


      FMO AAAs (3): Heavenly Spores (68), Fast Asleep (67)!, 0 (piano version) (66)! VC AAAs: 76
      Best VCs: Finders Keepers (64), Purple (64), Travel Demon (63), Final Step (63), A World of Piano (63), Balloon Fever (63), The Fusion (63)

      It's Only Natural BF
      Southern Cross BF
      Minute Waltz v2 BF
      Novo Mundo BF
      Stark Raving Mad BF
      Midnight Dragon 1-0-0-1
      Choprite 2 clean
      Rottel 2 clean
      J&C 2 clean
      Chronograph 2-1-0-1
      BB Evo 3 clean
      Staring at my Spaceship 3 clean
      Epilogue 3-0-0-1
      Banned Forever 3-0-0-1
      World Tour 2004 3-0-1-4
      Demon Beast Appearance 4 clean
      Gacha Gacha Figu Atto Radio 4 clean
      Gravity Blast 4 clean
      Just Why 4 clean
      Eternal Drain [Heavy] 4 clean
      300 4-0-0-1
      Pure Ruby 5 clean
      Destination of the Heart 5 clean
      Plasmatextor 5-0-0-4
      Oni 6 clean
      Yorukumoryuu Yamikaze 6 clean
      Ambient Angels 6 clean
      Colorful Course 6 clean
      Hajnal 6 clean
      Arsonist 7-0-0-1
      Setsujou! Hyakka Ryouran 7-1-0-1
      Face in the Gutter 8 clean
      Kanon Medley 8 clean
      Colibri 8 clean
      Summer Time Perfume 8 clean
      Blindfolds Aside 8-0-0-1
      Bubble Bath Aftermath 8-0-0-1
      Bloody Tears 8-0-0-2
      Ochitsukeruwakenaiwayo! [Heavy] 9 clean


      Originally posted by Moogle-master
      To be fair, having all the BlazBlue's isn't good taste more then it is common sense.

      Comment

      • bender5
        The 40% Iron Chef
        • Jan 2005
        • 4894

        #4
        Re: Judging a StepMania Score

        I like the idea of trying to establish a leader board. Maybe one day we can have an accurate implementation of your idea, but for now maybe we should have two separate lists.


        One list for the absolute best score on each song. (Possibly even with a more strict timing window specifically for this list e.g Ridiculous Timing.)

        The other list should be an interactive list that redirects us to a top 5, 10 or possibly even more depending on how in depth you want to get, on each file. It would allow anyone to submit a score they feel deserves to be noted. Even if it's not of the same caliber of God Tier scores. It could be a way for everyone to share their personal bests and give various goals to achieve at all levels of physical ability


        For now a general trust system is going to have to be in place, as it has been for a long time. I know there isn't anywhere near as much controversy over who is and isn't legitimate as there used to be that's for damn sure.

        Comment

        • Dynam0
          The Dominator
          • Sep 2005
          • 8987

          #5
          Re: Judging a StepMania Score

          Originally posted by redsea
          Are some scores really so debatable that a stricter time window is needed to determine whether or not one is better than the other? All we really care about is the harder files anyways
          I agree that it seems silly to try and differentiate between two plays where the accuracy is almost incomparable...I would be okay with saying that there is a maximum obtainable score in the game (there are a lot of games that have this property I suppose).


          At what Judge offset do we consider that 1 extra perfect negligible though? If we are safe in saying that a AAAA on something is better than 1 perfect, couldn't we argue that a AAAA on a slightly higher judge is a "better" score than a AAAA on a lower judge? Would the fact that IzzySM AAAA'd Ageha on Judge 7 not change that 50 other people be tied with him as having the top score on the file?
          Last edited by Dynam0; 10-7-2013, 07:15 PM.

          Comment

          • Wafles
            FFR Player
            • Feb 2013
            • 1988

            #6
            Re: Judging a StepMania Score

            Honestly, debating about who has the best score on a Stepmania song always sounded like a strange concept to me. The whole idea of having a leaderboard by nature of the game is going to have some arbitrary standards which not everyone will ever agree with. There are about 5 different ways to calculate dance points alone, not to mention the whole higher judge settings being a thing.

            IMO Stepmania should always be about self improvement and not really competetive beyond a sense that its cool to have a rival or two to compare scores to.

            My 2 cents.

            http://smleaderboards.net/profile/view/Wafles

            Comment

            • Mollocephalus
              Custom User Title
              • Jul 2009
              • 2608

              #7
              Re: Judging a StepMania Score

              Originally posted by Wafles
              Honestly, debating about who has the best score on a Stepmania song always sounded like a strange concept to me. The whole idea of having a leaderboard by nature of the game is going to have some arbitrary standards which not everyone will ever agree with. There are about 5 different ways to calculate dance points alone, not to mention the whole higher judge settings being a thing.

              IMO Stepmania should always be about self improvement and not really competetive beyond a sense that its cool to have a rival or two to compare scores to.

              My 2 cents.
              i wholeheartedly agree with this. As long as it isn't combo scoring (lol ffr) i'm good with it. ITG percentage scoring is pretty fine to me.

              Comment

              • EzExZeRo7497
                • Dec 2010
                • 6858

                #8
                Re: Judging a StepMania Score

                lmfao I have so many things to input on this, especially since I've put in a couple of scores that aren't necessarily WRs through DP-marv, such as Alex's Shind Bad Heavy scores and Puppet's 0x1311.

                For speed scores (or files that aren't SDP/quadbait in general), the root of the problem is easily the DP-marv system:
                Notice that the DP-marv system is extremely biased towards scores that are very low in boo/miss count or full combos in general. There's not enough emphasis in the MA area, I don't think it should even be labeled a DP-marv system since there's so little difference in weightage when it comes to MA.

                Take WookE's The Final Conflict Edit score for example. 78 perfects and 8 greats full combo. The Final Conflict isn't necessarily easy to FC, so you'll probably get a couple of misses. Let's say you have a score with 1 miss. In order to even beat his score, you need 50 less perfects to TIE his world record. A 28 perfect 8 great score, regardless of miss count, is nothing short of insane and is probably unreachable for most players. The fact that you need such a large reduction in perfect count/great count in order to offset your CBs is ridiculous and I'm pretty certain a lot of people would find the 28p 8g 1 miss score more impressive than a 78p 8g FC.

                Personally I would overhaul the DP-marv system and put in less emphasis on misses/boos and put more emphasis on perfects. Something like say:

                Marvelous: 10
                Perfect: 8
                Great: 5
                Good: 0
                Boo: -10
                Miss: -20

                O.K.: 10
                N.G.: 0


                It would be a lot more balanced as supposed to the current system we have. A greater (although not excessive) emphasis on MA, but also a smaller emphasis on misscount. The only issue is that scores with good MA but bad CBs would override scores with average MA but with good CBs. That's where you can weigh in subjectivity and put in scores that are better. You can put both scores in, but not have one of them (or both or none, whichever you prefer) count towards the world record count.

                Realistically speaking though, it'll probably be too late to change the DP-marv formula and none of us (especially you, Dynamo) would want to go through 3 entire threads of SM scores filled with 5,000-35,000 scores per thread. So the second best alternative (when it comes to adding scores with really good MA but with bad CBs) would be to compare the person's MA on that file to pretty much everyone else. Take your (Dynamo's) Gaussian Mist for example. around 37 perfects with a miss or two. It's one DP away from the current WR, but no one I know has gotten close to 50s, let alone 37. The miss count is respectable too, especially for its accuracy. You can add that score into the list and you can decide if you count Staiain's, Dynamo's, both or neither scores towards their WR counts. The main problem when it comes to this is that how much of a difference in accuracy is needed to be considered "world record" is very subjective, but I'd say if it's like 20%-30% less perfects than anyone has ever gotten, it's probably worthy of one, given that its CBs are good for its accuracy as well.


                For files that are mainly MA bait, I don't think any additional/stricter timings are necessary. A quad is a quad, and there are a couple of reasons why I disagree with having stricter timings to determine which score has the best of the best accuracy:

                * Ridiculous Timing is exclusive to only 3.9. The fact that you are only limited to one SM version that you might not be used is pretty unfair and people who are oriented to SM 3.9/3.95 would have a slight advantage to people who use SM5. People who have extremely good timing on SM5 and have bad timing on SM3.9 would have a huge handicap and would give an "inaccurate" world record (since the person who's using SM5 would do significantly worse), so to speak. That's unless you allow the person who's using SM5 to use J7, which leads me to another point:

                * Judge 7 (or any higher judge than 4 in general) has a major advantage over Judge 4. Sure, Judge 7/J4 Ridiculous Timing would have the same timing as such, but Judge 7 has one main advantage. On J7, it's impossible to get an accidental good/boo (on J4) since J7's miss timing is equivalent to a good on J4. Players on Judge 7 would be able to get away with accidental misreads/doubletaps since the good window/boo window on J7 is so small compared to J4. Basically, files on J7 would be easier than they would be on J4.


                There are a couple of other reasons, but those are the two reasons that I could think off the top of my head. I could elaborate on them further if you want me to.


                To answer your question though, I'd much rather keep quads as quads and scores with better MA (albeit worse RA) as the "best" score compared to scores with worse MA but with better RA. If RA would be used for world records, it should be used for tiebreakers more than anything. If you have 1 more perfect than the current WR holder, you shouldn't have the WR just because your RA is better.

                I'll have to agree with Wafles on that SM isn't necessarily a competitive game, at least not as competitive as games such as FFR or osu!, it's definitely a game that's more on self-improvement more than anything. But hey, even statistics nerds want to see a couple of stats regarding this game, including myself.

                Also sorry for not editing the SM Wiki often enough Dynamer, school and life is taking over and FUCK captcha, I can't navigate through the wiki without asking to type a captcha repeatedly for 3 pages.
                Last edited by EzExZeRo7497; 10-8-2013, 08:49 AM. Reason: Grammar fixes

                Comment

                • Jousway
                  FFR Player
                  • Jun 2009
                  • 865

                  #9
                  Re: Judging a StepMania Score

                  and in this thread we see people care about their tiny e-penis size
                  Its not a bug its a FEATURE!



                  Comment

                  • EzExZeRo7497
                    • Dec 2010
                    • 6858

                    #10
                    Re: Judging a StepMania Score

                    Comment

                    • Dynam0
                      The Dominator
                      • Sep 2005
                      • 8987

                      #11
                      Re: Judging a StepMania Score

                      I totally agree that StepMania should be fun and about personal accomplishment but understand that the only way to gauge your accomplishment is through some judgement system that attaches a value to your performance. Obviously this system works great for casual players of the game but for players who are competitive, it starts to lose it's efficacy.

                      Originally posted by Jousway
                      and in this thread we see people care about their tiny e-penis size
                      Call it what you want, but the focus of this thread is to determine if we can accurately judge what a good score is. If you would like to judge the people who are interested in this concept, you can take your comments somewhere else.



                      The original idea for this thread was mainly from me running into problems keeping track of world records (like Eze I'm a fiend for statistics). Thinking about the whole concept of "which score is better" left me with the conclusion that scoring is a fairly subjective thing and can't accurately be determined using the same measuring stick for every note chart.

                      @Eze, regarding WookE's The Final Conflict Edit score, you said it yourself that FCing this file isn't easy and I think that gives justification for misses having such a high weight. I know a miss translating to 50 extra perfects sounds like quite a lot, but it's as you said too...trying to go back and change scores from ~2009 on is painfully...redundant when you consider what the impact would be lol.

                      I actually think the current system of real-time scoring on StepmaniaOnline is a really encouraging thought so long as the lag correction issue is dealt with in some way. I'm not sure how scores were recorded on the old servers, but I never had this score adjusting issue before the new SMO was integrated. I think having some kind of comprehensive bank of scores that are measured on the same scale is the right way to go for a game like this.

                      Comment

                      • icontrolyourworld
                        Enjoy life!
                        FFR Simfile Author
                        • Oct 2007
                        • 4192

                        #12
                        Re: Judging a StepMania Score

                        in my opinion if 2 people quad a file they both should have the world record, I don't think there's a very competitive scene for ridiculous timing and higher currently (there would be like less than 10 people trying to compete with RA, but hey there could always be a separate leader board for it ya know)

                        and then if someone holds the only AAA on a file, where someone else has a AA with better timing I'd say note both of them in the world records, because both scores would be interesting and just note that the AA score is objectively better or something like that
                        Last edited by icontrolyourworld; 10-8-2013, 07:14 PM.

                        Comment

                        • EzExZeRo7497
                          • Dec 2010
                          • 6858

                          #13
                          Re: Judging a StepMania Score

                          Originally posted by Dynam0
                          @Eze, regarding WookE's The Final Conflict Edit score, you said it yourself that FCing this file isn't easy and I think that gives justification for misses having such a high weight. I know a miss translating to 50 extra perfects sounds like quite a lot, but it's as you said too...trying to go back and change scores from ~2009 on is painfully...redundant when you consider what the impact would be lol.

                          I actually think the current system of real-time scoring on StepmaniaOnline is a really encouraging thought so long as the lag correction issue is dealt with in some way. I'm not sure how scores were recorded on the old servers, but I never had this score adjusting issue before the new SMO was integrated. I think having some kind of comprehensive bank of scores that are measured on the same scale is the right way to go for a game like this.
                          To address your first paragraph, maybe my example isn't the best to use (although yes, TFC Edit is hard to FC, I really don't think that a penalty for just a miss is really necessary.. a 30 perfect difference would be good but 50 is probably too much lol), and I also forgot another example (I wanted to also compare my two Last Winged Unicorn AA scores, which the messier score with worse PA has a slightly higher DP than a cleaner score with less overall CBs and better PA to show that the miss weightage/boo weightage is a little too harsh) but then I realised that that's not really ontopic at this point haha.

                          The SMO server does a great job in keeping the scores in, but due to its different way of recording scores (they judge your score through your TIMING, not on the actual score you get in the results screen, unlike previous SMO servers), score offsets do happen. It's nothing that can be done really, I don't think the problem's going to be fixed any time soon. An asterisk towards SMO scores should be good, but you should also add the WR that wasn't on SMO (as long as it isn't a significant difference like 3-4 perfects) just for verification I guess, unless you're absolutely certain that the score on SMO has no perfect offset whatsoever. As for the WR count, I'm not sure where to pick. I'd personally pick the one that's NOT on SMO, for skepticism's sake I guess.

                          For speed MA WRs, I think this is a decent solution to the issue:
                          Realistically speaking though, it'll probably be too late to change the DP-marv formula and none of us (especially you, Dynamo) would want to go through 3 entire threads of SM scores filled with 5,000-35,000 scores per thread. So the second best alternative (when it comes to adding scores with really good MA but with bad CBs) would be to compare the person's MA on that file to pretty much everyone else. Take your (Dynamo's) Gaussian Mist for example. around 37 perfects with a miss or two. It's one DP away from the current WR, but no one I know has gotten close to 50s, let alone 37. The miss count is respectable too, especially for its accuracy. You can add that score into the list and you can decide if you count Staiain's, Dynamo's, both or neither scores towards their WR counts. The main problem when it comes to this is that how much of a difference in accuracy is needed to be considered "world record" is very subjective, but I'd say if it's like 20%-30% less perfects than anyone has ever gotten, it's probably worthy of one, given that its CBs are good for its accuracy as well.
                          Timing windows are always important really, I think the bigger question is that what's the point where DP-marv becomes redundant and we can base off WRs through subjectivity haha. Although DP-marv does give this "objective" way of determining WRs, it comes with its flaws and I wouldn't necessarily follow the system all the time.

                          Comment

                          Working...