The best kittens, technology, and video games blog in the world.

Tuesday, September 08, 2015

Tree of PRNGs design pattern

Kitten kaleidoscope by hehaden from flickr (CC-NC)
Here's a problem I had while coding Modern Times mod for Crusader Kings 2 - I needed to generate a ton of data.

Crusader Kings 2 is a game about characters, and I put a few historical villains like Hitler, Putin, and al-Baghdadi - somehow the bad guys is are who people request the most - but that's not even close to enough. There's nearly a 100 independent countries on the map, over timespan of 1900-2015, and in addition to rulers I still need to setup regional vassals, some kind of families so you can engage in dynastic politics and so on. About 20k for now and it will likely end up with a lot more.

What could be simpler? Every language has some kind of rand(min, max) function, and that should be enough, right?

Well, that would have a horrible side effect of the mod being different every time I rebuild it, and that's seriously unacceptable. Even worse, I'd need to write real unit tests instead just going for my favourite diff output reference_output as refactoring safety net, and who's got time for that? Especially considering how annoying it is to write tests for non-deterministic processes.

So obviously I'd jut seed the PRNG with something and it solves the problem, right? Oh by the way, here's how you can create deterministically seeded PRNG object is Ruby in a non-stupid way, as stdlib doesn't have any such way (unlike in Java and early versions of Ruby, Object#hash is not consistent between program runs for security reasons):

require "digest"
class Random
  def self.keyed(key)
    new(Digest::MD5.hexdigest(key).to_i(16))
  end
end


Anyway, global PRNG is just not good enough. It does the bare minimum that rebuilding the mod returns same output and I can use diff against reference as smoke test for pure refactoring, but any change I make anywhere will completely change all the data that comes afterwards.

PRNG Counter Mode

Why is that a problem? When refactoring the green condition is simple - output shouldn't change at all, dumb diff can tell me if it worked.

When changing or extending functionality it's more complicated - output related to the program changes should change (in correct way), while everything else should remain the same, as a statistical distribution.

For example every ruler has some chance to be male or female. I turned this off for popes as it was a bit silly, and I'm OK with popes' names and such details changing (for one - they should no longer be called Joan), but since now every random number generated after first pope has different history, it affected every character on the map. How can I be sure that's just seed shift and not another bug I accidentally introduced?

If we had quantum testing tools - or infinite testing time - "output should remain the same as a statistical distribution" would be usable condition. A we don't, we need to seed parts of PRNG to get repeatable pseudorandom data.

So how about instead of global PRNG we seed it with key containing character ID for data related to every character?

Every character already needs an unique ID anyway. The most straightforward way I used initially would be to just allocate them sequentially starting from some large number that won't collide with vanilla characters (like 100,000,000 or so), and then PRNG seeded with Random.keyed("character #{id}") would be used to generate all data related to that character.

This worked reasonably for a while - as long as number and order of characters became constant, changes in character didn't affect other character's pseudorandom data.

Of course that didn't last long - number of characters had no desire to remain the same, and especially when characters would get a (random) number of random relatives that would make diffs completely unpredictable.

My first quickfix was to bucket IDs by some timestamp like character's birthday or crowning like some kind of mongodb ObjectIds, but this data was not even close to random - like when a country becomes independent (or mod's timespan starts) all its regions get a new ruler on same day, and that just bubbles every year as all generated reigns, deaths, and childbirth ages were at yearly intervals for simplicity.

I could fuzz the dates away from full years to reduce conflict, but why the hell am I even using pseudosequential IDs if their only condition is uniqueness and stability between runs?

PRNG Tree

This actually has a very nice solution - counterintuitively, to make data more stable just pick a pseudorandom ID for everyone, instead of pseudosequential one.

What?

Well, for every top level ruler we can come up with some reasonable unique string - like "ruler-#{title}-#{crowning_date}". We don't generally put multiple rulers on same date, so number of conflicts should be low - some due to silly special case, most due to birthday paradox:

def allocate_id(key)
  space_size = 10_000_000
  base_shift = Digest::MD5.hexdigest(key).to_i(16)
  id = @namespace + (base_shift % space_size)
  while @characters.has_key?(id)
    base_shift += 1
    id = @namespace + (base_shift % space_size)
  end
  @characters[id] = :placeholder
  rng = Random.keyed("#{key}-#{id}")
  [id, rng]
end

And so everybody gets ID and PRNG seeded with that ID.

Well, what about people who don't have unique top level descriptors? We just use #{parent_id}-#{relation} as unique key, where parent_id is ID of any previously generated character (root or not), and relation is a (hopefully) unique relation between that parent and child.

In this case "parent" and "child" are literally parent and child (as vassals are just keyed with whichever land they rule over like normal rulers), and due to lack of twins (this will totally need to change at some point to properly deal with Polish history) I just use ""#{parent_id}-#{child}-#{birthdate}" as the key.

And it all works. Really well.

The last remaining detail is why I seed PRNGs with 
Random.keyed("#{key}-#{id}")
 instead of 
Random.keyed("#{id}")
.  This is not necessary for isolation, but it works around a different artifact - if due to birthday paradox new character A steals old character B's ID, keying PRNG by just ID would mean that A might steals B's whole identity including name, surname, and names of all their children. Of course B would get completely random and independent new set of data - but that could be a serious debugging mess.

This sort of implies that I should use whole original key (or at least whole hash, not just allocated ID which throws away most bits) to create sub-PRNGs, but as this protects from a rather unusual scenario I didn't bother doing so yet.

Links

Code for Modern Times mod and my other CK2/EU4 mods is on github.

Modern Times mod is available on Steam Workshop, just are many other mods by me.

1 comment:

Satyendra Paul said...

really cute cats :)