Friday, June 8, 2012

GT Appearance Scoring Pt 3

What follows will be a ridiculous over-analysis and serious effort to turn something mundane into a ridiculously complex discussion all in an effort to prove the following point:

- To be as correct as possible when calculating Best Overall, your Appearance Scoring system should contribute its points in exactly the same way as your Generalship Scoring system.  (Likewise for Sports if you count it)

For those of you interested in all the intricacies and what amounts to a simple concept being blow out of proportion to make me sound smarter than I really am, read on!  Otherwise, skip to the next article where I will discuss my proposition for what our Appearance Scoring system will actually be for consideration and input (when that article is available of course).  The goal of this article is to lay the ground work behind the thought process that will be used to upgrade our Appearance Scoring solution (and eventually Sports Scoring solution) to ensure that is useful, as accurate as possible, and just as important, contributes fairly to Best Overall.

Oh yeah, this is going to be really long, too!

An Unbalanced Best Overall

Recall that Best Overall is truly intended to find the most talented participant in the room.  It combines the person who brings it the most in Generalship, Appearance, and for us, Sports.  If you let one of those categories contribute to the Best Overall score in such a way as to "crowd out" or "boost" the value of another category, you are now, most likely inadvertently, favoring one category over another in some small but potentially tournament changing way.

As a disclaimer: Having an unbalanced Best Overall is not necessarily a bad thing nor am I suggesting it is.  Plenty of people think that, for example, Generalship should count for the majority of Best Overall points while soft scores contribute less.  For us, however, our goal is treat them all equally.  This being said, it also just so happens to turn out that if you aren't ensuring that your contribution is happening like you think it is, even a skewed Best Overall may still not be delivering the results it is intended to.

It's a lot easier than you might think to have your score contributions be unbalanced.  The reason is that it is generally very simple to come up with a very good scoring mechanic for a given category, even a very smart one, BUT the magic (and pitfalls) happens when you try to combine them.

I can get overly philosophical which I like to do, but I'll cut it short with a quick example.  Let's say you want to compare Apples, Oranges, and Pears (which, funnily, is exactly what we are trying to do here!).  To do so, you have to define what it is about them that you want to compare.  Straight up, you can't compare them, but you can compare common things ABOUT them, like their height, width, flavor, color, and so on.  That's all good, but a problem will pop up if you try to combine what you are measuring about these things to define a Watermelon (makes perfect sense right?!).  For example, let's say our Watermelon is the combination of the HEIGHT of all the Apples, Oranges, and Pears I have.  That's easy, then.  Just add up all their heights.  Done!  However, if I told you that a Watermelon MUST not be comprised of more than 33% Apples, Oranges, and Pears, respectively, can I be certain now that measuring each of my fruits is a "fair" contribution to the height of my Watermelon?  Pears seems to be "taller" on average than Apples (just made that up).  So, based on that, the more Pears that go into making your Watermelon, the taller your Watermelon will also probably be.  By this simple analogy, Pears contribute more to the Watermelon than do the other fruits.  Thus, your TALLEST Watermelon will GENERALLY (not always) end up being comprised of more Pears than Apples or Oranges if the total number of Apples and Oranges and Pears can never exceed a certain value.

Yeah I know... so anyways...

Here's an example of just how easy it is to design a Best Overall system that is unbalanced.

Let's say you score Generalship on Win/Loss (which we do!).  Let's say that Generalship's contribution to the Best Overall category then is the Number of Wins / Number of Losses.  So, the guy who wins all of his games contributes a full 100% of Generalship.  The guy who loses all games contributes 0%, and everything in between.  Great!

Let's say that your Best Appearance award is on a scale of 0-400 like ours was last year.  Everyone gets scored based on a rubric.

To calculate Best Overall just using those two categories, you contribute each equally.  You get the % of total Generalship and add that to the % of total Appearance.  So, if I went 5/1 last year, my contribution to Generalship would be 83%.  If I scored a 137 on the Appearance Rubric, my Appearance contribution would be 34.25%.  Now, my Best Overall should be 1/2 and 1/2, so (0.5) * 83% + (0.5) * 34.25% = 58.625% of the total Best Overall points available.

Now, here comes the monkey wrench!  There are so many ways this system can fail to fairly produce that Best Overall score, and I will try and list a bunch in a bit.  For now, let me just insert one to prove the point.

Generalship's contribution to Best Overall in this example is very RIGID.  Think about it this way, for our 64 man GT this year, contributions based on the Wins/Games system will ALWAYS be like this at the end of the day:

1 - 100% (6/0)
8 - 83% (5/1)
15 - 66% (4/2)
20 - 50% (3/3)
15 - 33% (2/4)
8 - 17% (1/5)
1 - 0% (0/6)

The thing about Generalship is that this will ALWAYS be the case.  It doesn't change no matter the other factors in the room such as the AVERAGE PLAYER SKILL, the DIFFERENCE IN SKILL BETWEEN ONE GUY AND ANOTHER, and also important, it's IMPOSSIBLE for 2 guys to contribute 100%.

The same cannot be said of our Appearance Scoring method.

Just for an example, I checked our Appearance Scores from last year, and the average score in the room (after discarding 0's) was 241/400.  Hrm.  This could have an effect on Best Overall, couldn't it?  Indeed!  Because the average person in the room is now contributing 60.25% to the Best Overall category while the average General is contributing only 50%.

Thus, at the end of the day, Generalship scores MATTERED more when it came to our Best Overall than did Appearance Scores.  The reason why that is true is because the upper level Appearance Scores were less meaningful than the upper level Generalship scores because the average participant scored higher than middle of the road.  So, whereas Generalship has a granulation of 50% between middle and top, Appearance had one of about 40%.

Did it matter?  How would I know?  Well, I decided to check.  To do so, I decided to convert my Generalship Scores and my Appearance Scores into Z scores, comparing the values to their averages rather than our rubric.  My Best Overall didn't change at all.  He DOMINATED my GT anyways, scoring extremely high in all 3 categories.  However, my Best Overall 2nd-5th DID matter.  They were swapped around slightly, and you could see that the guys with higher Appearance Scores suddenly start to bubble up and just BARELY edge people above them who had a higher Generalship score.

See, all of this doesn't much matter so long as the spread between your participants is LARGE.  However, when it gets tight, those little decimals can mean differences in placement.

I would bet, if I had more than 50 people, my average Appearance would have approached 200.  I say to myself, oh well, it's a wash right?  Well, turns out, just "reaching average" isn't all that is required.  There are just a seemingly endless amount of potential ways in which Appearance Scoring can create problems, almost all of which spawn from the fact that a Generalship score is so RIGID while an Appearance Score is potentially (not necessarily!) FLUID.

Here's a short list just off the cuff of things that will cause a potential imbalance:

  • Does an Appearance Rubric always produce an average result for the average person?  What happens if everyone shows up at your tournament with an army they paid to have painted by Blue Table Painting?
  • Does a rubric produce enough granularity even if it does have a consistent average?  What happens when 49 people show up with *yawn* and GREGOR TEH AWESOME!!!1! shows up with an army so cool it blinds you just to look at?
  • Does a rubric allow for ties for the top score?  Our Generalship rubric doesn't.
  • Using my above example, will my Appearance Scoring system have a way to contribute values other than 0%, 17%, 33%, 50%, 66%, 83%, and 100%?  If so, it can unbalance my Generalship scores.
  • Does a rubric allow for ties for the bottom score?  Our Generalship rubric also doesn't.
  • Is a rubric skewed to allow people to earn easy points while only granting small point values to the hard to get stuff?  (This is done to make Appearance Scores attainable in a lot of ways).  This will skew your Appearance contribution causing Generalship to matter more.
  • Does the difference between the BEST army in the room and the 2nd best cause the contribution of their scores to a gap less than or greater than 17%?  Generalship does not.  (E.g:  99/100 = 99%.  98/100 = 98%.  For Generalship, top place is 100%, 2nd is 83%.)
  • Does an Appearance rubric always produce a 0%?  Generalship does.
  • And lots and lots more.

One pitfall I kept running into all week long as the volume of problems kept seeming to increase no matter how much I tried to be creative with upgrading our Appearance scoring was that I kept wanting to dismiss them for practical or logical reasons.  This would amount to thoughts like:

  • "yeah, well, it's not realistic that people will show up with armies like that..."
  • "it's not realistic that I'll get that many ties"
  • "but we're talking such a small number here, it won't really matter"
  • "I can design a methodology which will fairly overcome that"
The real back breaker thought is this one:
  • "well, it's okay to have 2 people share the top spot if they both have awesome armies!"  (this one is insidious because what it is tempting you to do is not treat your Appearance scores as worthy of competition as you are your Generals... which means you aren't really believing they are equal)

Yesterday, I had it all worked out where my new Appearance scoring methodology was going to be to examine Z scores compared to an average of the room.  I had written all my arguments why this was the best method using all the thoughts from above to show why, realistically, that was as good as it gets.  I was prepared to finish up my rubrics, design my post rubric judging methodology (read tie breakers), and press on.

This morning, on the way to work, though, I think I finally solved the problem in a much better way.

I've hinted at it repeatedly here as I wrote this article.  The best solution I have come up with, which I will extrapolate on in the next article, is to make the final result of your Appearance scores to produce a contribution which will exactly mirror your Generalship's ability to contribute.  That is, whatever you do to get your Appearance rankings, when you go to translate them to Best Overall, the way in which they contribute should come as close to exactly matching the way in which your Generalship translates and contributes.

In other words, rather than adding up your Apples, Oranges, and Pears to get Watermelons, instead, convert your Apples, Oranges, and Pears into Grapes first, then all you need to do is write a formula to convert Grapes to Watermelons.  

The key is to model it after your Generalship because your Generalship is as close to a rigid, fundamental truth as it gets.  This isn't some kind of genius discovery or anything.  It's just the logical conclusion that should have been (and probably is to most people) obvious to begin with. 

The plan, then, will effectively be the same thing as if I started pairing players together at the table, a judge walks up to score both armies on Appearance, declares a winner, and then match them up with their next opponent.... just like Generalship.  Now, it's not going to be exactly that because there are things we can do that are much easier and also eliminate some flaws in how Generalship is calculated, but that's the over-arching theme for how it's going to work.  

And when it's all over, the contributions will be just about identical and Best Appearance will still be the best army in the room.

This will also have all sorts of side benefits which includes things like being able to set your appearance rubric however you like, skewed, straight, average, or more importantly for me, able to incorporate a level beyond a rubric all the while not unbalancing your contribution to Best Overall.  The only real factors then will end up being, just like our Generalship does already, no ties and everyone gets ranked.

(What the specifics are going to be is still being worked on :P)

1 comment:

  1. Here is how you do a prize to keep people play, painting, and sports for the average players...

    Use flights like golf tournaments.
    simply score the overall like you are planning and then at half have a Best Overall Second Flight.

    So it's like this....
    1 gets Best overall
    26th gets Best Overall Second Flight and a prize.

    It make golf tournaments interesting for us average players and keeps us trying to do good.

    Just a thought.

    Josh Dunn


