taw's blog: June 2012

Friday, June 29, 2012

Amethyst migrated to github

Feline Royalty by Photography By Shaeree from flickr (CC-NC-ND)

Amethyst was my totally awesome idea for coding Perl with Ruby syntax.

I just moved it to github, fixed a bunch of bugs, added some examples and better command line interface.

It doesn't do anything much more complicated than this:

hello = ["Hello", "world!"]

puts(hello.join(", "))

ary = [2,10,30]
ary2 = ary.map{|x| x*3}

ary.each_with_index{|element, i|
  print("* ", i, " => ", element, "\n")
}

but it's meant to be fun, not useful.

There's some interesting magic inside and if you want to learn to write parsers with Parser::RecDescent it's not the worst place to start.

If someone wants to turn it into a real language (CoffeScript kind of real), go ahead, code any patches you want, and send me pull requests on github.

As I promised, I intend to migrate the rest of my old software to github and clean it up a bit in process.

Wednesday, June 27, 2012

New jrpg quests

Once upon a time I released jrpg with a promise that if anybody ever finishes the game, they can send the save file to me, and I'll definitely add a few new areas.

Well, over the years not one but three people sent me their completed save files, and I finally decided to do something about my promise and expanded the map to add a few new areas and a bit longer quest.

The quest and areas are actually on fairly low level (kana words, very simple kanjis), not after you win the entire game, since then very few people would actually get there.

Unfortunately the new quest line is not savegame compatible, so people who got far in the game probably don't want to upgrade it, since it will forget what they've learned and they'll start from 0 xp - and even worse the game won't know which kanji you know so it will be pretty dumb at presenting them to you until it figures it out.

You can download the version from jrpg website.

Sunday, June 24, 2012

Game Theory proves that UW Delver deserves a banhammer

Ragnificent Ragdolls - Mushu as a kitten by DirtBikeDBA (Mike) from flickr (CC-NC-ND)

On June 20th none of UW Delver cards have been banned. This was a horrible decision, but I'm not going to say one word about specifics of UW Delver in this post, since that has been discussed ad nauseam in other places. The entire post is about DCI's reasoning for lack of bans, and how it shows total lack of understanding of basic statistics and game theory. Let me quote the decision:

The DCI looked at the results of competitive Standard events. We found that while a high percentage of the participants played White-Blue Delver decks, that the win rate of those decks was very close to par. For instance, in a recent MTGO PTQ, the win rate of White-Blue Delver decks against non-Delver decks was a bit under 51%. In general there are decks that the Delver deck is strong against, and decks that it is weak against, but on average the deck tends to get results close to average.

Additionally the number of people playing high level Standard events is the highest ever. Looking at the Magic 2013 card set, it appears that there may be more tools for other decks than for the White-Blue Delver deck, though time will tell if this bears out. The DCI will continue to observe how this plays out, but is taking no action.

I know nothing of tournament attendance, and as for new M13 cards I'll just quote this as a cautionary tale:

We tried to enable a few specific anti-Jace weapons in Mirrodin Besieged with Phyrexian Revoker, Hero of Oxid Ridge, and Thrun, the Last Troll, but the metagame solved those cards pretty quickly with Squadron Hawks and Swords.

New Phyrexia brought with it Despise and Hex Parasite, but those cards just aren't powerful or versatile enough.

Anyway, the biggest argument is "on average the deck tends to get results close to average". I'll show you in simple mathematical terms why this is not only wrong, it's completely backwards.

Simplifying assumptions

None of these assumptions are essential to the argument, they just make the math simpler. If you have time, you can redo the proof in a more complicated variant.

Let's assume that each player plays to win (more on that later), knows meta, players playing each archetype are equally good on average (this doesn't preclude some players being better, as long as they're either few of such superstars or they're not particularly attached to any archetypes, or both), and has no budget constraints (that's a reasonable assumption in Standard, where competitive deck prices are similar, less so in Vintage).

Let's assume there are N possibly tournament-viable archetypes in a format, and disregard issues like minor variants within each archetype, sideboarding strategies etc.

For each pair of such archetypen I and J, p(I,J) is probability that first player (playing I) wins the match. We don't need pre-sideboard and post-sideboard win probabilities, since if α and β are pre and post sideboard probabilities respectively, then:

p = αβ + α(1 - β)β + (1 - α)ββ

And by similar reasoning all kinds of on-play vs on-draw chances can be folded into a single number. There will be slight inaccuracy if some games are best-of-3 and others are best-of-5, if some games have unusually high chance of being drawn unintentionally due to timeouts (like control mirrors vs aggro mirrors) and if people decide to do intentional draw or not depending on what their opponent is playing, but we'll just fold as much of these as we can into a single "match win %" number, and won't be too concerned with that cannot be simulated this way.

How players decide what to play

There are many ways to simulate this, and they all lead to very similar outcomes. Since we assumed N archetypes, let's start with each of them having 1/N of the field. To make all these long lists of numbers look less dreadful I'll call the decks by randomly assigned 2 and 3 color combinations (but they're actually just labels).

Then people are paired randomly with another player, and each match loss makes a player change their deck with some low probability to a deck randomly chosen from current meta (so if 20% of people are playing archetype X, a player which wants to change their deck will pick X with 20% probability; if they already play X they have 20% chance of picking another decklist within the same archetype for simplicity).

Now this isn't particularly realistic - someone who lost 5/5 games is more than 5x more likely to change their deck than someone who lost just 1/5 on mana screw - but almost any such procedure for evolving meta will lead to very similar outcomes. Got it so far? Now let's create a fresh format with 20 archetypes, generate random matrix of match probabilities within 25%..75% range (except mirror match is always 50% by definition), set decklist-change-after-loss to 1% and simulate some rounds. What will be archetypes' meta shares and their average win percentages? Let's see one such random meta!

Random Fair Meta

It's time to run some simulations. At the beginning meta looks really balanced, no deck has particularly high or particularly low chance against the field (it is honestly just a coincidence that UW got on top):

UW - 5.0 (55.4)
Bant - 5.0 (52.7)
WR - 5.0 (52.5)
GB - 5.0 (52.5)
WB - 5.0 (52.3)
UR - 5.0 (51.9)
RUG - 5.0 (51.1)
Grixis - 5.0 (50.8)
BUG - 5.0 (50.6)
Kaalia - 5.0 (50.4)
Jund - 5.0 (49.5)
Junk - 5.0 (48.9)
Naya - 5.0 (48.9)
Esper - 5.0 (48.6)
GU - 5.0 (48.2)
GR - 5.0 (48.1)
BR - 5.0 (47.4)
UB - 5.0 (47.2)
WUR - 5.0 (47.1)
WG - 5.0 (46.0)

After 2000 rounds we start to see meta forming, but no deck is particularly bad:

UW - 13.7 (54.1)
Bant - 8.6 (52.5)
WB - 7.2 (50.6)
Grixis - 6.9 (52.7)
GB - 6.6 (50.4)
WR - 6.3 (49.5)
UR - 5.5 (48.8)
RUG - 4.9 (48.4)
Esper - 4.5 (50.4)
Kaalia - 4.3 (48.5)
Jund - 4.1 (49.0)
GU - 3.7 (49.3)
BUG - 3.4 (45.5)
Junk - 3.3 (47.2)
Naya - 3.3 (46.9)
UB - 3.1 (48.5)
GR - 3.1 (47.0)
BR - 3.0 (47.6)
WUR - 2.8 (47.4)
WG - 1.8 (43.7)

4000 rounds, UW is top deck, but Grixis has highest win percentage, so it goes up:

UW - 21.1 (50.0)
Grixis - 14.4 (54.0)
Bant - 11.5 (50.0)
GB - 7.0 (50.7)
WB - 6.1 (48.5)
Esper - 5.0 (49.9)
WR - 4.4 (47.8)
GU - 4.0 (51.4)
Jund - 3.9 (50.7)
Kaalia - 3.6 (50.5)
UR - 3.2 (46.1)
UB - 3.1 (51.7)
RUG - 2.7 (45.7)
Junk - 1.9 (47.8)
BR - 1.9 (47.7)
WUR - 1.7 (46.8)
GR - 1.6 (46.9)
Naya - 1.6 (46.5)
BUG - 1.0 (42.7)
WG - 0.4 (41.4)

10000 rounds, a lot of decks are seeing no play, Grixis was even briefly top deck, but then falls down in popularity, a lot of rearrangement in top 5:

UW - 20.1 (52.2)
WB - 14.5 (48.7)
Bant - 12.0 (53.2)
GB - 11.7 (48.4)
Kaalia - 10.1 (47.9)
Grixis - 7.5 (48.8)
GU - 5.6 (50.1)
Jund - 4.9 (49.8)
UB - 3.3 (48.7)
WR - 2.9 (46.8)
Esper - 2.3 (52.0)
Junk - 2.1 (49.6)
BR - 1.4 (50.6)
Naya - 0.5 (45.8)
RUG - 0.5 (49.0)
UR - 0.4 (46.6)
GR - 0.1 (42.5)
WUR - 0.1 (45.8)
BUG - 0.0 (43.6)
WG - 0.0 (42.1)

After 20000 rounds:

Bant - 25.7 (50.7)
UW - 17.4 (50.0)
Kaalia - 10.7 (47.7)
Jund - 9.8 (51.1)
GB - 7.9 (48.5)
WB - 7.7 (46.7)
Grixis - 6.4 (52.6)
UB - 5.0 (53.2)
Esper - 2.4 (52.4)
Junk - 2.1 (49.6)
GU - 1.7 (48.6)
BR - 1.7 (50.6)
RUG - 0.9 (50.7)
WR - 0.5 (47.1)
UR - 0.1 (49.1)
Naya - 0.0 (43.7)
BUG - 0.0 (45.6)
WUR - 0.0 (43.9)
GR - 0.0 (41.9)
WG - 0.0 (40.6)

50000 rounds:

UW - 27.3 (51.2)
Bant - 15.8 (52.2)
Kaalia - 12.9 (47.1)
Junk - 11.1 (48.4)
GB - 9.2 (49.7)
Grixis - 8.5 (51.9)
WB - 6.3 (46.8)
UB - 4.5 (48.9)
RUG - 2.1 (50.0)
Jund - 0.7 (49.3)
BR - 0.6 (49.5)
GU - 0.3 (49.3)
UR - 0.3 (48.3)
Esper - 0.2 (53.1)
WR - 0.2 (46.2)
Naya - 0.0 (45.2)
WUR - 0.0 (47.8)
BUG - 0.0 (43.2)
GR - 0.0 (40.7)
WG - 0.0 (38.3)

75000 rounds:

Bant - 24.4 (49.9)
UW - 18.5 (49.2)
Grixis - 13.5 (52.5)
Kaalia - 12.0 (47.7)
WB - 10.9 (47.1)
UB - 6.5 (54.7)
GB - 4.5 (48.8)
Esper - 3.6 (53.3)
Junk - 2.7 (51.1)
Jund - 1.3 (51.0)
BR - 1.1 (52.3)
RUG - 0.9 (48.0)
GU - 0.1 (51.3)
WR - 0.0 (46.2)
UR - 0.0 (46.1)
Naya - 0.0 (46.3)
WUR - 0.0 (43.2)
BUG - 0.0 (41.9)
GR - 0.0 (41.9)
WG - 0.0 (39.8)

100000 rounds:

UW - 27.1 (48.0)
Grixis - 21.4 (51.1)
GB - 13.1 (51.4)
Kaalia - 11.2 (52.2)
UB - 7.8 (52.1)
Bant - 6.9 (48.1)
RUG - 3.9 (45.7)
WB - 3.3 (51.8)
Junk - 2.7 (49.4)
BR - 1.3 (47.6)
Esper - 0.7 (47.8)
Jund - 0.7 (49.8)
GU - 0.0 (51.1)
WR - 0.0 (49.4)
UR - 0.0 (45.1)
Naya - 0.0 (47.9)
WUR - 0.0 (47.6)
BUG - 0.0 (41.6)
GR - 0.0 (44.3)
WG - 0.0 (37.6)

OK, let's stop there. This randomly generated meta was actually fairly healthy, with multiple decks being playable, and frequent rearrangement.

But did you notice something interesting about win chances (they're against whole field including mirrors)? At no point did any deck have a chance far higher than 52%, while worst decks were really low like <40% bad. Shouldn't average of win rates be exactly 50%?

Actually, no! Average win rate against random field (round 0) is 50% - but once meta establishes win chances are calculated against other good decks (so they're close to 50%), white bad deck changes are also calculated against good decks (so they're much worse than 50%). It doesn't matter how hard any deck crushes bad decks, since they're not in meta in any appreciable amounts, and so they're not counted.

Let me restate this:

Average win rates of decks in any meta are always lower than 50%, since better decks are played more than bad decks. It is difficult for any deck to have much higher win rate than 50%.

Abby - cat loaf 1 by DirtBikeDBA (Mike) from flickr (CC-NC-ND)

Random Unfair Metas

Now that was a reasonably healthy meta. What about metas which are totally unbalanced and unfair? Let's simulate one.

Now let's pick one deck - let's say UW (for no particular reason, it just happened to be the first in my list of labels, honest...) - for which win probabilities are taken from 0.45..0.75 range, while all other decks have to use 0.25..0.75 range. That's right - this UW doesn't have particularly high chances against any deck, it simply never has particularly low chances against any deck.

Let's give it a try. After 0 rounds:

UW - 5.0 (all 61.7 / no mirror 62.3 / against UW 50.0 / except UW and mirror 62.3)
Junk - 5.0 (all 55.9 / no mirror 56.2 / against UW 26.5 / except UW and mirror 57.9)
BUG - 5.0 (all 53.7 / no mirror 53.9 / against UW 37.4 / except UW and mirror 54.8)
Grixis - 5.0 (all 52.2 / no mirror 52.3 / against UW 47.3 / except UW and mirror 52.5)
Naya - 5.0 (all 50.9 / no mirror 51.0 / against UW 30.9 / except UW and mirror 52.1)
Kaalia - 5.0 (all 50.5 / no mirror 50.5 / against UW 41.0 / except UW and mirror 51.1)
RUG - 5.0 (all 50.3 / no mirror 50.4 / against UW 46.5 / except UW and mirror 50.6)
GB - 5.0 (all 50.2 / no mirror 50.2 / against UW 27.3 / except UW and mirror 51.5)
GU - 5.0 (all 50.1 / no mirror 50.1 / against UW 50.8 / except UW and mirror 50.0)
WUR - 5.0 (all 49.8 / no mirror 49.8 / against UW 38.4 / except UW and mirror 50.5)
UB - 5.0 (all 49.7 / no mirror 49.7 / against UW 39.7 / except UW and mirror 50.3)
BR - 5.0 (all 49.2 / no mirror 49.2 / against UW 31.6 / except UW and mirror 50.2)
Jund - 5.0 (all 48.8 / no mirror 48.8 / against UW 34.7 / except UW and mirror 49.6)
Bant - 5.0 (all 48.6 / no mirror 48.5 / against UW 35.5 / except UW and mirror 49.2)
Esper - 5.0 (all 47.3 / no mirror 47.1 / against UW 51.3 / except UW and mirror 46.9)
WB - 5.0 (all 47.1 / no mirror 47.0 / against UW 39.6 / except UW and mirror 47.4)
WG - 5.0 (all 47.1 / no mirror 46.9 / against UW 25.2 / except UW and mirror 48.1)
WR - 5.0 (all 47.0 / no mirror 46.8 / against UW 37.1 / except UW and mirror 47.4)
UR - 5.0 (all 46.0 / no mirror 45.8 / against UW 29.1 / except UW and mirror 46.8)
GR - 5.0 (all 43.9 / no mirror 43.6 / against UW 46.6 / except UW and mirror 43.4)

After 100000 rounds:

UW - 68.7 (all 49.6 / no mirror 48.7 / against UW 50.0 / except UW and mirror 48.7)
Esper - 30.8 (all 50.9 / no mirror 51.3 / against UW 51.3 / except UW and mirror 51.1)
GU - 0.5 (all 50.2 / no mirror 50.2 / against UW 50.8 / except UW and mirror 48.9)
Grixis - 0.0 (all 49.6 / no mirror 49.6 / against UW 47.3 / except UW and mirror 54.6)
WUR - 0.0 (all 49.5 / no mirror 49.5 / against UW 38.4 / except UW and mirror 73.7)
BUG - 0.0 (all 48.2 / no mirror 48.2 / against UW 37.4 / except UW and mirror 71.9)
WR - 0.0 (all 46.5 / no mirror 46.5 / against UW 37.1 / except UW and mirror 67.3)
Bant - 0.0 (all 43.9 / no mirror 43.9 / against UW 35.5 / except UW and mirror 62.3)
RUG - 0.0 (all 43.0 / no mirror 43.0 / against UW 46.5 / except UW and mirror 35.2)
BR - 0.0 (all 43.9 / no mirror 43.9 / against UW 31.6 / except UW and mirror 70.9)
Naya - 0.0 (all 42.8 / no mirror 42.8 / against UW 30.9 / except UW and mirror 69.0)
GR - 0.0 (all 42.5 / no mirror 42.5 / against UW 46.6 / except UW and mirror 33.4)
Kaalia - 0.0 (all 40.5 / no mirror 40.5 / against UW 41.0 / except UW and mirror 39.5)
Junk - 0.0 (all 40.3 / no mirror 40.3 / against UW 26.5 / except UW and mirror 70.5)
WB - 0.0 (all 37.8 / no mirror 37.8 / against UW 39.6 / except UW and mirror 34.0)
UB - 0.0 (all 38.4 / no mirror 38.4 / against UW 39.7 / except UW and mirror 35.5)
Jund - 0.0 (all 35.3 / no mirror 35.3 / against UW 34.7 / except UW and mirror 36.6)
GB - 0.0 (all 36.5 / no mirror 36.5 / against UW 27.3 / except UW and mirror 56.7)
UR - 0.0 (all 33.1 / no mirror 33.1 / against UW 29.1 / except UW and mirror 41.8)
WG - 0.0 (all 33.9 / no mirror 33.9 / against UW 25.2 / except UW and mirror 52.8)

This meta is about as degenerate as they ever get. It's 68.7% top deck, 30.8% deck designed to beat the top deck (and it barely does so), and 0.5% third deck. And do you see? Top deck's win percentage is below 50%!

Degenerate meta simply showed in high meta share, and total absence of almost all possible decks (in fair simulation 12 of 20 archetypes had some play, now it's only 3 of 20). What typically shows in simulations is that first all decks that lose to top deck get down to 0%, then top deck establishes ridiculous meta shares (often >90%), then sometimes decks which are good against it get significant share.

Here's second simulation. After 0 rounds:

UW - 5.0 (all 58.9 / no mirror 59.4 / against UW 50.0 / except UW and mirror 59.4)
GU - 5.0 (all 55.7 / no mirror 56.0 / against UW 36.8 / except UW and mirror 57.1)
RUG - 5.0 (all 53.2 / no mirror 53.3 / against UW 43.3 / except UW and mirror 53.9)
WB - 5.0 (all 53.0 / no mirror 53.2 / against UW 40.5 / except UW and mirror 53.9)
GR - 5.0 (all 52.6 / no mirror 52.8 / against UW 27.7 / except UW and mirror 54.2)
Bant - 5.0 (all 52.5 / no mirror 52.7 / against UW 49.0 / except UW and mirror 52.9)
WG - 5.0 (all 52.5 / no mirror 52.6 / against UW 45.9 / except UW and mirror 53.0)
Esper - 5.0 (all 51.1 / no mirror 51.1 / against UW 47.5 / except UW and mirror 51.3)
WUR - 5.0 (all 50.3 / no mirror 50.3 / against UW 43.2 / except UW and mirror 50.7)
UR - 5.0 (all 49.7 / no mirror 49.7 / against UW 40.7 / except UW and mirror 50.2)
Grixis - 5.0 (all 49.2 / no mirror 49.2 / against UW 46.1 / except UW and mirror 49.4)
Junk - 5.0 (all 48.8 / no mirror 48.7 / against UW 51.8 / except UW and mirror 48.6)
UB - 5.0 (all 48.4 / no mirror 48.3 / against UW 40.3 / except UW and mirror 48.7)
Naya - 5.0 (all 48.2 / no mirror 48.1 / against UW 28.8 / except UW and mirror 49.2)
BUG - 5.0 (all 47.5 / no mirror 47.4 / against UW 33.0 / except UW and mirror 48.2)
GB - 5.0 (all 47.1 / no mirror 47.0 / against UW 52.3 / except UW and mirror 46.7)
Jund - 5.0 (all 47.1 / no mirror 46.9 / against UW 41.9 / except UW and mirror 47.2)
WR - 5.0 (all 46.2 / no mirror 46.0 / against UW 33.7 / except UW and mirror 46.6)
Kaalia - 5.0 (all 45.1 / no mirror 44.8 / against UW 41.1 / except UW and mirror 45.1)
BR - 5.0 (all 42.8 / no mirror 42.4 / against UW 28.0 / except UW and mirror 43.2)

After 100000 rounds:

UW - 91.8 (all 50.1 / no mirror 50.7 / against UW 50.0 / except UW and mirror 50.7)
Esper - 5.1 (all 48.3 / no mirror 48.2 / against UW 47.5 / except UW and mirror 70.0)
GB - 3.1 (all 51.1 / no mirror 51.1 / against UW 52.3 / except UW and mirror 29.8)
Grixis - 0.0 (all 47.8 / no mirror 47.8 / against UW 46.1 / except UW and mirror 67.3)
Junk - 0.0 (all 50.8 / no mirror 50.8 / against UW 51.8 / except UW and mirror 39.6)
Bant - 0.0 (all 48.7 / no mirror 48.7 / against UW 49.0 / except UW and mirror 45.9)
RUG - 0.0 (all 45.2 / no mirror 45.2 / against UW 43.3 / except UW and mirror 66.7)
WG - 0.0 (all 45.7 / no mirror 45.7 / against UW 45.9 / except UW and mirror 43.6)
WUR - 0.0 (all 44.4 / no mirror 44.4 / against UW 43.2 / except UW and mirror 57.2)
UR - 0.0 (all 42.9 / no mirror 42.9 / against UW 40.7 / except UW and mirror 67.8)
Kaalia - 0.0 (all 42.8 / no mirror 42.8 / against UW 41.1 / except UW and mirror 62.2)
UB - 0.0 (all 40.9 / no mirror 40.9 / against UW 40.3 / except UW and mirror 47.7)
WB - 0.0 (all 40.6 / no mirror 40.6 / against UW 40.5 / except UW and mirror 42.4)
Jund - 0.0 (all 42.0 / no mirror 42.0 / against UW 41.9 / except UW and mirror 42.3)
GU - 0.0 (all 39.0 / no mirror 39.0 / against UW 36.8 / except UW and mirror 63.2)
WR - 0.0 (all 34.9 / no mirror 34.9 / against UW 33.7 / except UW and mirror 48.1)
BUG - 0.0 (all 33.5 / no mirror 33.5 / against UW 33.0 / except UW and mirror 39.3)
Naya - 0.0 (all 30.0 / no mirror 30.0 / against UW 28.8 / except UW and mirror 43.2)
GR - 0.0 (all 29.2 / no mirror 29.2 / against UW 27.7 / except UW and mirror 45.5)
BR - 0.0 (all 29.0 / no mirror 29.0 / against UW 28.0 / except UW and mirror 40.5)

This one is even worse - GB has decent matchup against UW, but since Esper is really brutal against GB (70% win rate), it makes GB totally nonviable, and UW has 91.8% meta share. So we have top deck (UW), meta deck (GB), and meta-meta deck (Esper).

Third unfair meta. After 0 rounds:

BR - 5.0 (all 58.4 / no mirror 58.9 / against UW 50.7 / except UW and mirror 59.3)
UW - 5.0 (all 56.8 / no mirror 57.2 / against UW 50.0 / except UW and mirror 57.2)
Esper - 5.0 (all 53.3 / no mirror 53.4 / against UW 39.3 / except UW and mirror 54.2)
WR - 5.0 (all 52.7 / no mirror 52.9 / against UW 54.5 / except UW and mirror 52.8)
RUG - 5.0 (all 52.3 / no mirror 52.4 / against UW 53.3 / except UW and mirror 52.4)
WB - 5.0 (all 52.2 / no mirror 52.3 / against UW 37.9 / except UW and mirror 53.1)
GR - 5.0 (all 51.8 / no mirror 51.9 / against UW 51.4 / except UW and mirror 51.9)
UB - 5.0 (all 50.6 / no mirror 50.7 / against UW 48.8 / except UW and mirror 50.8)
Bant - 5.0 (all 50.5 / no mirror 50.5 / against UW 33.0 / except UW and mirror 51.5)
Kaalia - 5.0 (all 50.0 / no mirror 50.0 / against UW 41.0 / except UW and mirror 50.5)
GU - 5.0 (all 48.9 / no mirror 48.9 / against UW 46.2 / except UW and mirror 49.0)
Junk - 5.0 (all 48.6 / no mirror 48.6 / against UW 29.7 / except UW and mirror 49.6)
Jund - 5.0 (all 48.4 / no mirror 48.4 / against UW 29.9 / except UW and mirror 49.4)
BUG - 5.0 (all 48.1 / no mirror 48.0 / against UW 46.7 / except UW and mirror 48.1)
GB - 5.0 (all 47.6 / no mirror 47.5 / against UW 43.3 / except UW and mirror 47.7)
Grixis - 5.0 (all 46.7 / no mirror 46.5 / against UW 41.8 / except UW and mirror 46.8)
UR - 5.0 (all 46.2 / no mirror 46.0 / against UW 26.9 / except UW and mirror 47.1)
WG - 5.0 (all 46.0 / no mirror 45.8 / against UW 53.6 / except UW and mirror 45.3)
WUR - 5.0 (all 45.7 / no mirror 45.5 / against UW 40.5 / except UW and mirror 45.7)
Naya - 5.0 (all 44.9 / no mirror 44.6 / against UW 44.8 / except UW and mirror 44.6)

Now BR is actually better than UW in vacuum! But since UW has no particularly awful matches, and BR presumably does, after 100000 rounds it degenerates to:

UW - 42.9 (all 50.6 / no mirror 51.0 / against UW 50.0 / except UW and mirror 51.0)
BR - 26.7 (all 50.6 / no mirror 50.9 / against UW 50.7 / except UW and mirror 51.1)
RUG - 14.4 (all 51.3 / no mirror 51.6 / against UW 53.3 / except UW and mirror 49.8)
Kaalia - 10.6 (all 46.0 / no mirror 45.5 / against UW 41.0 / except UW and mirror 49.7)
GU - 3.2 (all 46.4 / no mirror 46.3 / against UW 46.2 / except UW and mirror 46.4)
GB - 2.2 (all 47.0 / no mirror 47.0 / against UW 43.3 / except UW and mirror 49.8)
Naya - 0.0 (all 48.2 / no mirror 48.2 / against UW 44.8 / except UW and mirror 50.7)
UB - 0.0 (all 51.2 / no mirror 51.2 / against UW 48.8 / except UW and mirror 52.9)
WR - 0.0 (all 49.8 / no mirror 49.8 / against UW 54.5 / except UW and mirror 46.2)
Esper - 0.0 (all 44.3 / no mirror 44.3 / against UW 39.3 / except UW and mirror 48.1)
WB - 0.0 (all 42.1 / no mirror 42.1 / against UW 37.9 / except UW and mirror 45.3)
GR - 0.0 (all 44.8 / no mirror 44.8 / against UW 51.4 / except UW and mirror 39.9)
WUR - 0.0 (all 43.3 / no mirror 43.3 / against UW 40.5 / except UW and mirror 45.4)
Junk - 0.0 (all 40.5 / no mirror 40.5 / against UW 29.7 / except UW and mirror 48.6)
Bant - 0.0 (all 36.5 / no mirror 36.5 / against UW 33.0 / except UW and mirror 39.2)
WG - 0.0 (all 44.7 / no mirror 44.7 / against UW 53.6 / except UW and mirror 38.1)
BUG - 0.0 (all 41.2 / no mirror 41.2 / against UW 46.7 / except UW and mirror 37.0)
Grixis - 0.0 (all 39.6 / no mirror 39.6 / against UW 41.8 / except UW and mirror 38.0)
UR - 0.0 (all 37.5 / no mirror 37.5 / against UW 26.9 / except UW and mirror 45.4)
Jund - 0.0 (all 35.8 / no mirror 35.8 / against UW 29.9 / except UW and mirror 40.3)

billy and zena. by ☼Ourania2005 from flickr (CC-NC-SA)

Conclusions

I could keep rerunning these simulations, and there are some interesting and nonobvious points there, like having fewer bad matchup being more important than having any amazing matchups. The big thing which should be obvious in retrospect is:

In any stable meta top deck's win percentage will always be close to 50%. If it was much different than that people would switch to the top deck or away from it. Degeneracy can only be seen in meta share %. It never shows in win percentages.

So if win percentage doesn't show which deck is the best, what does it show?

High average win percentage of certain decks means meta didn't adapt to these decks yet. Flat win percentages of top decks near 50% mean meta is stable, not that it's balanced.

That's right - statistics show that current Standard meta is both degenerate (Delver's very high % of the field), and stale (top decks very close to 50% win rate) without much chance of evolving. I thought this should be obvious to anyone with a clue about statistics and game theory (to learn start here) - it's basic result from first chapter of any game theory textbook that results of all strategies will converge to the same average. But apparently from both DCI announcement and ensuing discussion it seems that very few people understand this and the awful argument gets repeated over and over again.

Now this post is concerned only with statistics. Maybe M13 will fix it. Maybe someone will come up with brilliant anti-Delver deck out of nowhere. Maybe people enjoy playing Delver mirrors so much they'll keep coming to tournaments regardless of how many Delver decks are played. It's all possible, and I don't really have any evidence for that. But arguments from win percentages are wrong, and mathematics is very clear about that. It's too late to fix it, but let's hope the next time DCI faces similarly degenerate meta they'll look at numbers that matter (meta share of top deck) not numbers that show something completely different (win rates of top deck).

If you want the simulation script, email me (it's fairly straightforward Ruby code).

Tuesday, June 05, 2012

Skyrim review

I finished playing Skyrim just a couple of days ago, and it deserves a proper review, not just a quickie of the type I usually write on my Google+. By the way, if you use Google+, feel free to follow me there to get all my brilliant ideas which were too long to fit 140 characters. Or on any other non-Facebook social network you can find me.

Anyway, Skyrim. I played Oblivion a few years back, and for all its flaws I really loved it. At least up to the point where I made myself 100% chameleon gear. Then the game got a bit boring, but I was 90% done anyway by then so it's not a big deal.

And I loved Skyrim even more than Oblivion, but before we get too enthusiastic about it...

Dragons

The main theme of Skyrim is dragon stuff, and it's one massive spectacular miss.

Dragons are spectacularly weak if you have any kind of distance weapon - either a bow or a lightning bolt. They just fly around pointlessly, taking damage, and every time they actually hurt you they'll fly away to let your health regenerate - even without bothering with taking cover or using health potions.

By the end of the game I was able to kill weaker dragons with a single sneak shot from my Daedric Bow, and the tougher dragons just needed a few lighting bolts on top of that to convince them to die and let me eat their souls.

Even the final boss - who's some sort of a god and used to rule the entire world some time ago - is less threatening than an average bear. Stephen Colbert was right.

Now not only dragons are awfully weak - dragon shouts - your special spell-like ability which supposedly separates you from the common masses - is even weaker. Even highest level shouts are worthless compared with low level weapons or spells, and there's no way to upgrade them - the best you can do is somewhat lower cooldown period between shouts, but even with Amulet of Talos and Blessing of Talos that's still way too slow. Meanwhile 1-shot sneak-kill-nearly-anything bows and 0-mana top level fireballs are all within your reach on mid to higher levels.

Maybe some mods fix it. But I doubt it - dragon shouts look entirely unfixable because every player character has full access to them for quest reasons - so if they were made very strong you wouldn't have archers, and warriors, and mages, and thieves, and everything else you could be - you'd have just one dragon shouter character class instead. But they could be made at least somewhat less useless.

The entire dragon-related main quest is pretty mediocre compared with many far more awesome side quests. I think I cared far more about the Civil War, and just about any guild quests than about the dragon stuff.

Character Customization

This is modern equivalent of grown adults playing with dolls. You can even choose makeup for your character! Oh sorry, it's called "war paint", since we're playing a serious game, not a Japanese RPG.

So I just have two questions:

Why cannot I choose hair in blue or pink or another cute color
Why even bother if pretty much just about every single armor covers your character completely and you'll only see eyes, or not even that. I even did some questing without any helmets so I can see my character, but then since my helmet and cowl both had archery bonuses, it was prudent to put it back on once fighting started.

Both issues sound like something mods should be able to fix with ease.

Game Balance

Game balance in Elder Scrolls? Of course there is none. You can make totally broken character with infinity+1 swords etc. in many different ways. That's part of the charm of the series I suppose. If you want more balanced game look for mods.

The best build seems to be to max out enchanting, smithing, and some alchemy, since that's how you'll make infinity+1 gear for everything, but it's sort of balanced out by this kind of play being boring as hell.

Other than for the first 10 or so levels of game, there's never any reason to care about money, which is great, since looting and trading systems are just awful.

Bows are brutally murderous. You can sneak into a room and 1-shot sneak-murder 2-3 people before anybody even realized that you're there and you lose your 3x sneak bonus. Since 2-3 people is how many opponents a typical room has at most, it means with a bit of caution you can clear out entire dungeons without anybody getting into your melee range. Not that getting into melee range would hurt you - you can literally slow down time with your bow, outrun everyone even in your heaviest armor (for some reason everyone else is really slow, even without any perks etc. Skyrim has no Athleticism skill so it's puzzling), and only huge mobs of high level monsters can seriously hurt you.

Fighting magic is fairly weak. Fire magic is fairly decent, but even if you get casting cost down to 0 mana with some gear and get all the perks and Destruction up to 100, it's hard to match damage per second your bow does. Lightning magic is OK against dragons, but dragons are so awfully weak it doesn't really matter. Frost is just worthless since 2/3 of all enemies (including all Nords and all undead) are 50% or more frost resistant.

Melee builds are OK, sadly even the best armor can only reduce damage you take by 80%, so that's one thing which cannot be broken even by infinite+1 smithing and enchanting.

Sneaking and backstabbing (as opposed to sneaking and bow kills) is not a viable combat strategy, but it's fun as hell for assassinations.

There are also other unbalanced areas - lockpicking which happens with a minigame is super-simple from level one, but pickpocketing which happens by chance is extremely difficult until very late.

Anyway, the game is not particularly difficult so don't bother yourself with optimizing your build on default difficulty level.

source

Leveling

Probably the worst aspect of Oblivion was its leveling system - there was a very complicated system for determining your level, so you were better off consulting Internet guides before even creating your characters to make sure the game won't be too unfair - and then all enemies automatically matched your level, making any kind of progress pointless, and progress in noncombat skills made the game literally more difficult. Oh and if your character had high level, all quest items (level-independent) were total crap compared with what you could find on just about any random bandit. Oblivion was pretty much unplayable without a mod to fix leveling system.

Skyrim largely fixes that, but not completely. You level in easy to understand way based on your skill progression. Enemies become somewhat better as you level, but not as ridiculously as in Oblivion, and their gear in particular doesn't improve that quickly.

With just a few exceptions (Ancient Shrouded Armor, Oghma Infinium, either version of Azura's Star), unique quest items are still worthless junk. A mod that would simply make them x2 better or so would improve the game greatly, without actually affecting balance all that much (yes, they're so bad).

Quests and Followers

Many of the questlines are totally awesome, and radiant quest system ensures you can fill the game with as much or as little random dungeon exploration and bandit killing as you want.

The main quest is relatively weak, but I'm sure if a mod could fix power level of dragons and dragon shouts, it would improve greatly.

One thing I particularly loved about Skyrim is the moral choice system - namely that you have a ton of moral choice but without anybody keeping any artificial score:

I started as a proper adventurer, who wouldn't even steal anything
Then I joined Thieves Guild, and my respect for other people's property waned somewhat
Then I joined Dark Brotherhood - and unlike in Oblivion you cannot do in by accident, it's a very much conscious choice to murder an at least more or less innocent person in cold blood. Then you get more murder contracts.
I thought that was about as low as I could get, then after some events in Markarth (I'll spare you the spoiler) I moved to freelance mass assassination sprees, mostly of town guards and Thalmor agents.
There was also a fun ending of Dark Brotherhood questline, which involved more and more morally dubious murders.
And then I got invited to join a cannibalistic ritual sacrifice to Daedric Prince Namira. How did I even get here?
In the end I sold my soul to just about every single Daedric Prince. I'm sure they'll figure some way to divide it once I die. (by the way hero of Oblivion became a Daedric Prince at the end of the expansion pack)
The only thing you cannot do is kill children, but there's a mod for that.

All of this is purely optional. You can play a morally upright character, defender of all that is holy, destroyer of Dark Brotherhood, and all Daedra worshipers etc. It's just much more fun to be bad.

The second most important questline after the main quest is the Civil War, where you can join the Empire, the Rebellion (there's also a second minor rebellion, with arguments shockingly similar to Ulfric's rebellion), or even manage to get them to agree to a truce. Of course the Rebels are a bunch of racist morons, just about every single one of them, and unless you're roleplaying a Nord fanatic I cannot think why anybody would not want to crush them for greater glory of the Empire. Ulfric is also obviously a Thalmor agent paid to destroy the Empire from within, and don't let lack of clear evidence in-game convince you otherwise.

Followers

In Oblivion you could have followers in some quests - they were mostly pain in the ass since they had constant level, but enemies leveled with you, and your followers could die very easily.

In Skyrim you have some of them too - but after you finish some quests you can ask an NPC to join you in your questing. They are all either unkillable (the quest ones) or harder to kill (your regular non-quest followers) - when their health drops below some level all enemies leave them to regenerate their health and go after you instead. They can still die especially if you throw fireballs all over the place.

As far as I can tell followers have whichever level they had when you first met them, so at first they're pretty decent, but as you level up (and the enemies with you) they get progressively worse. Their gear is pretty consistently awful though. Another thing for modders to fix.

You can also buy a horse, but between fast travel system, no ability to fight from the horse, and the fact your horse is not any faster than you are on foot they're pretty useless. The only horse worth getting is one you get in Dark Brotherhood questline - that's one murderous beast, it even killed a somewhat wounded dragon once.

Bugs

You might be a bit curious why I'm reviewing Skyrim now since it was released so long ago, but I follow consistent rule of not playing new games ever.

A lot of games - including entire Elder Scroll series, Total War series, Witcher series etc. - consistently have gameplay-killing bugs in their first releases, and only a few months later these were patched to the point where you could comfortably play without thinking too much about the bugs.

Unfortunately even so long after the release, Skyrim is about as buggy as Oblivion - and I really should have installed some kind of unofficial patch mod. Many of the bugs are easily fixable with console and quick Internet search, but a few minor issues weren't, or at least I couldn't figure out how. Quick-save often, since crashes will happen.

Not technically bugs but everything related to looting, trading, encumbrance, and managing equipment is one huge pile of fail. It's a case of them trying to use interface meant for console controllers (where it simply has to suck by basic laws of physics) on PC which has keyboard and mouses, and could easily have an interface which doesn't suck. Once again, console are the cancer killing gaming.

Do yourself a favour, start console (~), then type player.modav CarryWeight 9000, so at least you won't have to deal with this mess all the time during dungeon exploring, only once in a while when you're trying to sell the stuff you looted.

Skyrim vs Oblivion

Other that things I mentioned, how does it compare with Oblivion?

There are still fewer mods - and these games really need to be played highly modded - but zero-mod Skyrim is much better than zero-mod Oblivion (which was pretty much unbearable due to leveling system).

Since I played Oblivion wikis got really awesome. These days every time you have a problem, or the game has bugs it's just a moment to alt-tab to a browser and find a solution.

They finally hired enough voice actors to make it not feel totally ridiculous. Dialogs make somewhat more sense, usually. Of course you get zero respect from guards and other NPCs even if you're hero of the Civil War, slayer of countless dragons, Archmage, Thane of all cities, wearing full Daedric Armor, and running the entire Skyrim from the shadows. I shouldn't complain too much because it just reminds me how awful it was in Oblivion at times.

One thing which got really worse is that other than a few special locations which look really lovely, Skyrim is all bleak, and gray/white, and snowy. It's worse than even modern shooters and their gray and brown color palettes. IIRC Oblivion had much more graphically diverse locations. Yes, special locations in Skyrim look different, but all the snow gets really tedious after a while.

Another thing which got a bit worse is moving around the map. There are far too many non-passable mountains in many places, the map doesn't indicate them in any way, and Clairvoyance spell broke in the middle of the game (damn bugs...) so it was extremely frustrating to get to some places the first time even when you had them on your map. The second time you'll be fast traveling of course.

I cannot compare it with Morrowind, since I still haven't figured out which mods I need to install for it, and every website claims something different.

Anyway, even with all its problems it's an easy 10/10, and I'm very rarely so enthusiastic about game, or anything else for that matter.

Ask ahead if you have any questions.