taw's blog

The best kittens, technology, and video games blog in the world.

Sunday, November 28, 2021

100 Languages Speedrun

Luna has landed! by hehaden from flickr (CC-NC)

I didn't take a long break after finishing the 100-episode daily Electron Adventures series. I already started another one - 100-episode daily "100 Languages Speedrun" series, where I'm trying out a new programming language every day.

It's been going for about a week now, and it's available:

I already explained the goals of the series in the first episode, so I'll just repost it below. Enjoy the series!

Time to start a 100 programming languages speedrun. Every day or so, I'll be posting about a different programming language. Not just doing 100 fizzbuzzes, but trying out something that's interesting about each language.

But that's not all, some of the programming languages I will create for purpose of this series. So if you follow along, you'll see not just a lot of different programming languages, but you might also learn a thing or two about how to create your own.

I won't be shy about my opinions, and I might be even exaggerating a bit. Feel free to share your thoughts in the comments.

Episodes will all be independent. Target audience is people who know programming, but don't know a 100 difference languages, so I'll often use some less idiomatic ways of doing things if I think it's clearer for such reader, or if it lets me showcase specific language feature better. For languages where it's not enforced, I'll mostly stick to best-practice cross-language code formatting (2 spaces indentation, double quoted strings, no semicolons etc.), even if that language generally uses something else.

Wednesday, November 17, 2021

Electron Adventures 100-post series is finished

Rena, with her black and white whiskers by Bennilover from flickr (CC-ND)

The Electron Adventures series I've been writing is now over.

The series is available on two platforms, with the same content:

If you want to read the conclusions, episode 99 (dev.to, hashnode) summarizes the technical aspects of the series, and episode 100 (dev.to, hashnode) talks about what it was like to do daily blogging. As it's already all there, I won't be repeating it here.

Or start from the first, or just check whichever episodes look interesting. Some subjects continue over a bunch of episodes, but there's plenty of fresh starts on the way, so you definitely don't need to read it 1 to 100.

I plan to do some more similar series in the future, they'll definitely be announced here, and for now I plan to double-post any such content to dev.to and hashnode, so feel free to follow me on whichever one's more convenient.

It's also possible that I might do a few bonus episodes beyond the 100 someday, as there's a few subjects I couldn't cover for various reasons.

Sunday, September 12, 2021

Electron Adventures 50 episodes so far

She's either helping me code or waiting to steal my pencil. #theLatter by AMsloan from flickr (CC-NC-ND)

Two months ago I got an idea of starting a small Electron coding project. And lacking any kind of moderation, I decided to just post daily coding episodes for like a 100 days. I'm halfway there.

There were definitely some technical issues early on, but I got that out of the way.

Since then I blogged at a rate of about 1 post a day - either creating or updating a small Electron program, and then doing a short writeup about it, with code samples and discussion of issues encountered along the way.

When I started, I had a vague idea of where I wanted to head:

  • I wanted to try out new code blogging platforms.
  • I wanted to collaborate with people. It's something I used to do a lot before the pandemic, but had too few opportunities since.
  • I wanted to figure out how I can code Electron in something that's not Javascript - Ruby, Python, basically anything whatsoever. Either purely non-JS, or in some kind of hybrid mode (JS frontend, non-JS backend). All other languages desperately need a good UI system, and I thought this might be worth investigating.
  • I wanted to be able to create Windows UIs for my Total War modding tools (and potentially Paradox modding tools too). I did them before with JRuby + Java-based toolkits, but none of that works very well.
  • mc more or less broke after software upgrade, and it freezes if I do anything funny, so I wanted to investigate how I can make my own Orthodox File Manager in Electron
  • and I might get some Electron and Svelte practice, as I don't use them too often

So far the adventure led me mostly somewhere else:

  • I did a few coding sessions with Amanda Cavallaro, but it's actually quite difficult to get someone to join such a big ongoing project for just an episode or two.
  • I started coding a file manager. Blogging about every tiny commit is fun, but it takes much longer to blog than to code, so I doubt I'll get even MVP this way.
  • I did some coding with Javascript frameworks I don't normally use like Vue or Marko; but zero with non-JS languages so far
  • I didn't even try to connect anything with my Total War modding tools
  • I definitely got that Electron and Svelte coding practice.

This is fine.

The series is available on two platforms, with the same content:

Each post has about 71 views total (60 views on dev.to and 11 views on hashnode).

Those numbers feel very low to me, as back in the good days a typical post on my blog would get thousands of views, occasionally tens of thousands. My most read post had 170k views.

I also don't know how many views new vs old posts get, so if people would read that for years and it would add up, or if they'd just fade away being barely searchable.

I'm also not sure how much value this series even has to the readers. A 100 post series is not something people really do, ever. Am I expecting people to read from start to finish? To pick it halfway? To just read random a post or two?

A number of the posts are reasonably self-contained, but especially the ones about building the file manager sort of assume some familiarity with existing code. I also didn't really make it obvious which episodes are self-contained and which aren't.

The codebase with each episode's code also has very little interest so far, at 2 stars and zero-activity 3 forks.

I'd probably still be coding something like this anyway, but the main point of writing is having some readers, so if people aren't very interested in this kind of content, I guess I could stop, or do something different.

I'll continue for the rest of the 100 episodes and then write another post about the whole experience.

Oh and I was contacted by multiple different people, who wanted me to write various educational content. Unfortunately I don't really have time for that.

Monday, August 23, 2021

Don't use codepen

Let's talk about websites you shouldn't use  - specifically codepen.

Back in the pre-pandemic days when I was helping people learn frontend programming, I used it a lot for showing various things. So I have 177 such "pens" there, where project is basically three files (HTML+CSS+JS).

Anyway, the why you shouldn't use it part. Codepen deliberately decided to block any way to export your data. Even getting list of your pens is not really possible without logging to Chrome and doing some console script loop. All automated access is deliberately blocked.

The only way to export is to go to 177 pens and click on a lot of buttons to get each individual zip file, and they did their best to block any way to automate this process.

The only way you can get your data - which they don't advertise in any way - is to send them GDPR data export request.

It comes in a fairly annoying format of one big csv file. I wrote a script to convert that data into a more usable form.

Being able to export your data is basic human right on the Internet, and deliberately making it difficult is a reason why you should not use websites like codepen.

Wednesday, July 28, 2021

So what happens to government debts after the pandemic?

meow money (or meowney) by Travis Nicholson! from flickr (CC-NC)

In most of the developed world, governments' economic strategy for the pandemic was to lock down the economy, take on massive additional debts, and give all that money away to everyone who might be affected by the lockdowns to keep the people compliant.

This would be fine, if the lockdowns were "two weeks" as was originally promised, but it's been already over a year, and depending on how new variants work against vaccines, it might take a few more years for the pandemic to end, and then another few for the economy to go back to normal.

It's common to look at GDP numbers, and claim it's not that bad, but GDP statistics are easily falsified by overwhelming short term cash injection.

Crime wave

A small side note, not terribly relevant to the main argument.

Many countries went through waves of riots (also known as "mostly peaceful protests" in propaganda media) and increased serious violent crime. 2020 in US had the highest homicide rate since late 90s, and 2021 data looks even worse than 2020 data. It's not as bad as early 90s, but it definitely is bad.

Some people make excuses that many other categories of crime didn't increase - but it's harder to commit burglaries if people stay at home.

Especially adjusted to prime crime committing population (young men, about 15-30), serious crime rates might already be as bad as in the '90s, it's just marked by including much more elderly in the denominator, who cannot commit any serious crimes even if they wanted to.

It's unclear if this is mostly an US specific issue, or if other countries are just as affected.

It's quite likely that this crime wave will take decades to go down to pre-pandemic levels, and until then, it will interfere with economic recovery.

Debt levels

Western countries already had record high debt levels before, as they never properly recovered from the 2008 recession.

Here's some visualizations of how bad debt levels already are. And it will keep increasing for as long as the pandemic is ongoing.

The quite recent orthodoxy that government debts should never exceed about 60% of GDP and shouldn't get even close to that except in most unusual circumstances, looks hilarious when the new normal is 100%+ debt for everyone, and 200%+ being nothing unusual.

Why is nobody even talking about it anymore?

So what happens next?

There really aren't that many things that can be done with such levels of debt. It would need to be some combination of.

Not paying the debt:

  • government could repudiate all or part of the debt, just say it won't repay it, and get over it - the chance of this happening is basically zero for as long as politicians can kick the can down the road
  • government could negotiate with creditors to get the debt down to more sustainable levels - Third World countries do that occasionally, are we going to see more of it?
  • government could print more money and erode debt through sustained high inflation - if expected inflation is 2%, then 20 years of higher 5% inflation is about equivalent to reducing the debt burden by about 40%. This is the cleanest solution, but we live in heavily inflation-phobic era, and even hitting 2% consistently seems to be a problem for central banks. Even worse many developed countries are not monetarily independent, but enslaved to the ECB.

Repaying the debt:

  • fast economic growth could erase debt to GDP ratio by rapidly increased GDP. Except the now dominant green anti-growth ideology is virulently opposed to the very idea of growth. They're virulently opposed to even building the much needed infrastructure (like the Heathrow Third Runway), and have been very effective at either preventing new infrastructure, or delaying it and increasing its cost so much it amount to basically the same thing. Even without the destructive green ideology, and NIMBYs, and all sort of special interests effectively using political process to block any potential competitors, it would be really difficult to achieve fast growth. Demographics of Western countries are all terrible - none except one have healthy demographic growth, and the one exception Israel only does it through Haredi population which largely refuses to integrate with modern economy, and is a huge burden on the rest of the society.
  • government could increase taxes to repay debt - taxes in most developed world are already very high, and the harder you squeeze, the more harmful it is to the economy; increasing taxes by much is also politically very difficult, so politicians are trying their best to pass sneaky anti-democratic pseudo-taxes - for example "taxing Amazon" (which obviously will be 100% paid by people who buy from Amazon). We'll definitely see more of that, but this doesn't come even close to addressing the enormity of the debt.
  • government could cut spending to repay debt - this works in theory, but it's a political suicide, and even if one party does it for the good of the country, very often opposition reverses the cuts as soon as they get the power
  • selling or long-term-renting government property - this could be done to reduce debt a bit - for example US federal government own obscene amount of lands it doesn't use, and doesn't plan to use, and which could be sold (or at least transferred to the individual states, which then can use or sell them).

Living with the debt:

  • paying interest, at rates compatible with historical rates - back in the 90s government interest rates in US were about 6%. At 200% debt to GDP ratio, this means 12% of GDP - or 1/3 of government revenue - goes to paying just interest, with no reduction in principal. This is ridiculously politically unsustainable.
  • paying interest, at very low rates - if on the other hand interest rates were more like 2%, then such payment can be sustained, but how can government ensure that the rates stay this low indefinitely? As Japan shows it is possible, but only by monetary policy so brutal that it makes economic growth impossible. Government pretty much has to either make it impossible for any private investment to offer better rates; or force banks to take government debt, and starve private sector of any credit.

And that's the full list. There's no combination of these that looks good.

MMT is nonsense

Yes, MMT is nonsense, no point wasting time on it. It's basically stupid motte of "if we accept very high inflation, we can money print ourselves out of any debt" with bailey of "inflation will magically not happen".

My adventures with dev.to and hashnode so far

As I said in my previous post, I was looking for a new blogging platform.

I created accounts on dev.to and on hashnode, and started my new Electron Adventures series.

Here's hashnode version:

And here's dev.to:

What went wrong with dev.to

dev.to was a huge disappointment. I tried to post episode 4, and it just outright refused to accept it, or even preview with "invalid markdown detected" error.

It wouldn't even give me a hint where was that "invalid markdown", so I had to delete parts of the posts, until I got it down to a single line, then simplified a bit:


Yeah, any code block with date: in it just crashes their engine. I tried to throw some backslashes in various places, but that didn't fix it.

I could maybe change code to say "da" + "te:", but that's really stupid for a code blogging platform.

Unless this gets fixed, I really don't see myself using it, especially as I like doing much weirder things with code than printing some URLs.

I'm not sure if there's any place where I can report the bugs where they'd actually read them and maybe fix that (for what it's worth, I tweeted at them).

What went wrong with hashnode

Hashnode let me post everything all right, but there's another problem. There doesn't seem to be any place where people can see all my blog posts. If I'm not logged in, my profile page looks like this:


Like, where the hell is my blog?

I suspect that maybe I need to go through the whole "setup your blog" process to get such page, but this really wasn't clear when I started. I thought by posting blogposts I'd already have some kind of blog created, but maybe that's not how it works?

What's next

I'll try to see if hashnode is fixable, and if dev.to actually responds to this bug.

I still have an option of using external markdown converter and posting here, even though it's a messy process. I'm not sure which other blogging platforms I can try.

I want to continue Electron Adventures, but I might take a short break to resolve these technical problems first.

Sunday, July 25, 2021

Looking for a new blog

service interruption by travel oriented from flickr (CC-SA)

I've had this blog here since 2006. I don't plan to delete it or anything, but it's not really a platform I'd recommend to anyone.

It's especially bad for talking about any tech issues - there's no support for code at all.

So I've been writing posts offline as Markdown offline, using Markdown to HTML converter, and just dumping the result here. The only extra step was manually finding a cat picture for the post. If I then notice any kind of corrections I'd like to do, they are quite awkward to make.

I've been not very happy about it, but in the past, this blog used to have a lot of readers, and moving to another platform would lose most of that engagement.

Well that's gone now - blogging is maybe a 10% as popular as it was in its Golden Age, and RSS is nearly dead - the few people who read blogs now either get there from Google, or from someone linking to specific post on social media. Either way, the cost of switching is much lower.

And timing is really good now. I have an idea for a new post series, with tons of code.

After checking a bunch of planforms, I created accounts on:
My plan right now is to post same content on both at least for a while to see which one I like better.

I'll probably keep posting on-coding stuff, and various announcement posts (usually of the "look at this cool software I wrote" type) here.

Some other content creation platforms I used, or still use:
  • gaming blog - I mod most games I play, so I write down notes as go, so I got an idea of maybe turning those notes into some kind of AAR; these are probably not the most fascinating AARs unless you also mod stuff; unfortunately Google deleted most of old screenshots there when Google+ died, so most of old posts have text only
  • youtube channel - I was posting gaming content there for a few years, I haven't done that in 3 years, but I keep thinking about resuming that
  • Google+ - well, that died, still miss it
  • twitter - I keep posting there out of habit, but really twitter is a sad shadow of its former lively self, and I keep thinking about just dropping that
  • twitch - I tried streaming there for a bit, but I never really got into this, as I don't watch live streams myself, I only watch gaming content on my own time, at 200% speed

Tuesday, July 13, 2021

Password hiding policy is insane

Ysabel by Daniel Panev from flickr (CC-SA)

Imagine you find yourself in some part of the world where people can still go to cafes, so you get a coffee, take out your laptop, and begin working on your spreadsheets or fanfics or whatever people do these days.

Then you remember that you're low on cat food. Let's see how security of that works out.

First you need to type cafe's WiFi "password". This password - completely worthless secret - will be turned into a bunch of ****** by your computer - so none of the people in the same cafe can even dare to see this.

Not to mention WiFi "passwords" are a conspiracy by broadband companies to sell more broadband, and really all WiFi should be completely passwordless and only prioritize the paying user over passers-by.  Most networks have very low utilization almost all the time, so it costs nobody anything. That's how we used to roll back in the '90s before big broadband successfully destroyed this social sharing model.

Anyway, once you get on the WiFi, you log into Instagram to check out some cat memes. Your social media password is of slightly more value than cafe's WiFi password, and your browser has decency of ******ing that too, so this part makes the most sense.

And then you remember, the cat food. You go to cat food website, enter your credit card number - all in plain sight of everyone in the cafe. Then secret three digit number on the other side of the card - all also in plain sight of everyone. Your name, address, and everything else one would need to literally steal your money - why, also all in plain sight of everyone who's in the same cafe. Also to every employee, as the cafe is fairly likely to have some cameras around, and it's really not hard to see what's on your screen with modern cameras.

How is this not utter insanity? Criminals don't care for your dinner photos, or your Instagram posts, or even really for your nudes or medical history (unless you're famous). For sure they don't care for WiFi passwords the tiniest bit.

The only thing they want is access to your money, and they can easily get it by just looking at your damn screen in a public place if you buy anything. And this is just one thing we in our utter madness decided to not hide, while we ****** every worthless WiFi password.

And it's not one online shop, or one browser doing this insanity. Everyone collectively decided to just be batshit insane about this for some reason. Did Russian mafia infiltrate Netscape back in the '90s and then we never fixed it, or are we all just so damn stupid?

Updates Hypocrisy

Gaby by DeGust from flickr (CC-NC-ND)

Tech companies just love forcing updates upon regular users. The idea that someone, somewhere, might be using a version of their software that's a few months old, stops them from sleeping well at night.

This goes all the way from operating systems through big applications all the way down to the tiniest utilities. They will force that update down user's throat, and the most freedom they allow users is to press a 24h snooze button. Some like Microsoft will not even bother asking, and will just reboot user's computer in the middle of whatever they were doing.

And it's not like they make any guarantees about it - if updates break things - and they absolutely will do that - there's usually no way to roll back, and you must be living on a different planet if you imagine you'll get any tech support whatsoever.

So if updates are so important, you'd think at least tech companies would be updating things automatically themselves? Nothing could be further from the truth. They built entire systems like npm's package-lock.json and its equivalents for literally every other programming environment to prevent any updates forever.

Even the idea of operating system updating some shared library dependency is too much, and nowadays everyone bundles all dependency libraries with every application, builds fully static binary, or just puts them in some sort of a fully no-update virtual machine like Docker container.

And it's not just minor packages - tech companies will happily run Python 2 or Java 8 or Debian "stable" a decade after release of their official successors.

So for all that task about importance of updates, this only seems to apply when their costs are borne by someone else.

I believe what tech companies do, not what they say, and I therefore believe that forcing users to update their machines, either by automated updates, or by endless popups without a No button, should be literally illegal. We don't allow manufacturers of physical things to invade your home to "update" your microwave or a book, why should we allow software manufacturers to invade our computers?

Saturday, July 10, 2021

Total War UI layout to XML converter

Chatons, Juin 2021 by Isabelle + Stéphane Gallay from flickr (CC-BY)

My most recent coding project was decoding UI layout files for all 10 Total War games from Empire to Three Kingdoms and writing converter that translates them to XML and back.

Here's a quick writeup of what I did, and how that went.

UI Layout files

Layout in the games are controlled by UI Layout files. They all helpfully start with a version number header - currently from Version025 to Version129. After that follows top level UI element, and within are nested children UI elements and many other things like UI states, transitions, events, and so on.

Basic building blocks

Basic building blocks of the format were fairly easy to understand, mainly:

  • booleans as 00 or 01
  • integers as int32
  • floats as float32
  • colors as BGRA32 (that is - one byte per component, in this order)
  • ASCII strings as int16 character count, followed by that many characters
  • Unicode strings as int16 character count, followed by that many UTF16 characters
  • various data structures had their fields in specific order, without any headers, or delimiters
  • for arrays of data structures there was generally int32 element count, then followed by each element in succession, without any headers or delimiters
There were also a few other patterns used less often, like:
  • optional fields - either 01 followed by some data structure, or just a 00
  • 128-bit uuids (weirdly no specific version, but still market as a uuid in variant bits)
  • occasional int8s and int16s
  • arrays of elements repeating until some special value like events_end
  • 2D arrays of elements prefixed by xsize and ysize
  • and so on

Manual decoding with hex editor

Most formats are quite easy to decode with a hex editor. This one wasn't - there were far too many versions, no data structure headers, no separators between data structures, and as pretty much everything was optional, so there were huge blocks of zeroes.

For example a block of 20 zero bytes could be any of:
  • 20 booleans false
  • 5 floats 0.0
  • 5 ints 0
  • 10 empty ASCII strings
  • 10 empty Unicode strings
  • 5 empty nested arrays of some child elements
  • or most likely some combinations of all of them
And there were such huge blocks of zeroes everywhere.

Decoding it without tool assist would be just too difficult, especially doing it over and over for every single version.

Original converter

Once upon a time alpaca wrote a Python converter for Napoleon Total War (second game on the engine). I inherited that, and extended it to backwards to Empire and forwards Shogun 2.

Even with all the fixes it had only maybe 90% support for those three games.

The most obvious approach would be fixing remaining issues and extending it further.

Unfortunately that would be very difficult approach.

Internal Representation Pattern

The converter was based on principle of Internal Representation. Every structure has a class. That class  basically has five methods:

  • initialize empty data structure with default values
  • read from binary file
  • write to XML
  • read from XML
  • write to binary file
This works well enough when there's one version of every structure, and it's fully understood. Unfortunately we have 62 different versions (some numbers between 25 and 129 were skipped), and we have very limited idea how things are represented.

Old converter tried to ignore many of those issues. For example writing to XML was just one hardcoded template string per data structure, so if layout file's version lacked some fields, it would just write default values anyway. Then on converting back it would read them and throw them away. This specific issue was partly limitation of Python, which is bad at DSLs, and this XML output really wanted a DSL.

A bigger problem was that if it didn't work for any reason, I got nothing. I'd get some "reading past end of file" error without any context whatsoever, and actual point where parsing derailed was located long before that crash.

Data gathering

Before I even started, I took latest versions of all 10 Total War games using current engine, extracted all UI layout files and put them as test set.

Analysis tool

Then I wrote analysis tool. The formats were really complicated, but there were some obvious things in them. Especially strings. Basically the analysis tool went over the file and identified every ASCII or Unicode string. Then it printed any undecoded data in nice ASCII + hex format.

That was a good starting point, but there was something I could do next. Not only I could see the strings, it was really easy to guess which string meant what. A string with font name was always followed by some ints controlling text display. A string with shader names by shader variables. Strings with image names were used in a few ways, but some simple heuristics could guess which were they.

So I soon had listings along these lines:
000129-000147 FontNameBlock "Ingame 12, Normal"
000148-000151 LineHeightBlock 2
000152-000155 FontLeadingBlock 1
000156-000159 FontTrailingBlock 255
000160-000174 DataBlock
  ...............  00 00 00 00 00 00 00 00 00 00 00 00 01 00 00
000175-000185 ShaderNameBlock "normal_t0"
000186-000189 ShaderVariableBlock 0.0 (0)
000190-000193 ShaderVariableBlock 0.0 (0)
000194-000197 ShaderVariableBlock 0.0 (0)
000198-000201 ShaderVariableBlock 0.0 (0)
000202-000270 DataBlock
  ........0....... 00 00 00 00 01 00 00 00 30 12 00 09 00 00 00 00
  ................ 00 00 00 00 00 05 00 00 00 04 00 00 bb ff be ff
  ................ 00 00 00 00 00 00 00 01 01 00 00 00 00 00 00 00
  ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  .....            00 00 00 00 00
000271-000282 EventListBlock []
000283-000294 DataBlock
  ........ .<.     00 00 00 00 01 00 00 00 20 b3 3c 0b
000295-000314 StringBlock "government_screens"
000315-000346 DataBlock
  H............... 48 01 00 00 8e 00 00 00 01 01 00 01 00 00 00 00
  ................ 00 ff ff ff ff 00 00 00 00 05 00 00 00 00 00 00
000347-000421 ImageListBlock 1 elements:
  000351-000421 ImageBlockGen1 id=163829448 xsize=256 ysize=256 path="data\\UI\\Campaign UI\\Skins\\fill 2 leather 256 tile.tga" unknown=4294967295
000422-000433 DataBlock
  ............     00 00 00 00 00 00 00 00 01 00 00 00
000434-000437 StateIDBlock 162986096
000438-000447 StateNameBlock "NewState"
000448-000451 XSizeBlock 624
000452-000455 YSizeBlock 720
000456-000484 DataBlock
  ................ 00 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00
  .............    01 00 00 00 01 00 00 00 00 00 00 00 00
000485-000503 FontNameBlock "Ingame 12, Normal"
000504-000507 LineHeightBlock 2
000508-000511 FontLeadingBlock 1
000512-000515 FontTrailingBlock -16777216
000516-000530 DataBlock
  ...............  00 00 00 00 00 00 00 00 00 00 00 00 01 00 00
000531-000541 ShaderNameBlock "normal_t0"
000542-000545 ShaderVariableBlock 0.0 (0)
000546-000549 ShaderVariableBlock 0.0 (0)
000550-000553 ShaderVariableBlock 0.0 (0)
000554-000557 ShaderVariableBlock 0.0 (0)
000558-000565 DataBlock
  ........         00 00 00 00 01 00 00 00
000566-000589 ImageUseBlock id=163829448 xofs=0 yofs=0 xsize=624 ysize=720 bgra=bgra(255,255,255,255)
000590-000626 DataBlock
  ................ 01 00 00 00 00 00 00 01 01 00 00 00 00 00 00 00
  ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  .....            00 00 00 00 00
000627-000693 EventListBlock ["OnUpdatePulse", "OnUpdatePulse", "OnDock", "DockHudRelative"]
000694-000705 DataBlock
  ............     0a 00 00 00 0e 00 00 00 e8 db b8 09

And you can probably already notice huge blocks of zeros I mentioned before - even after some zeros are not shown as decoded from context.

Direct Conversion Pattern

Now that I wasn't going completely blindly, I started writing a converter. In Ruby, as there was a lot of DSLing to do. But mostly it was based on a completely different principle - Direct Conversion.

Direct Conversion doesn't bother with any classes, or internal representations. It has methods such as (not actual code, just the general idea):
def convert_int
  value = get(4).unpack1("V")
  puts "<i>#{ value }</i>"
end
  
def convert_string
  size = get(2).unpack1("v")
  str = get(size)
  puts "<s>#{ str.xml_escape }</s>"
end

def convert_color
  b, g, r, a = get(4).unpack("CCCC")
  puts "<color>"
  puts "  <byte>#{b}</byte><!-- blue -->"
  puts "  <byte>#{g}</byte><!-- green -->"
  puts "  <byte>#{r}</byte><!-- red -->"
  puts "  <byte>#{a}</byte><!-- alpha -->"
  puts "</color>"
end
But bigger methods can be composed from smaller ones (also not actual code):
def output(str, comment=nil)
  print "  " * indent
  print str
  print "<!-- #{comment} -->" if comment
  print "\n"
end

def convert_int(comment=nil)
  output "<i>#{ get_int }</i>", comment
end

def convert_color
  tag "color" do
    convert_byte "blue"
    convert_byte "green"
    convert_byte "red"
    convert_byte "alpha"
  end
end

Advantages of Direct Conversion

Nice thing about this is that conversion back doesn't need to have any idea whatsoever what tags like color even are - other that most basic data types like strings, ints, floats, and booleans, the converter from XML back to binary needs nearly zero awareness of what those formats are.

So instead of describing every data structure 5 times, we do it just once. And any version specific logic can be handled by a single if @version >= 74 or such.

But there's more. Since we never need to construct any internal representation, if conversion crashes, the converter will give us full context of the error!
  <model>
    <s>composite_scene/porthole/troy_advisor_test.csc</s><!-- mesh path? -->
    <s>standard_advisor</s><!-- mesh name? -->
    <!-- some model data or anim header or sth -->
    <data size="1">
      01
    </data>
    <i>0</i><!-- 00:00:00:00 --><!-- anim count or something? -->
    <s></s><!-- anim name? -->
    <s></s><!-- anim path? -->
    <!-- rest of anim stuff or sth -->
    <data size="4">
      00 80 3f 00
    </data>
    <!-- 2900 - end of model data -->
  </model>
</models>
<no /><!-- end of uientry flag 5B? -->
<no /><!-- end of uientry flag 6B? -->
<error msg="Invalid boolean value: got 63" version="121">
  Data before fail:
  ne/porthole/troy 6e 65 2f 70 6f 72 74 68 6f 6c 65 2f 74 72 6f 79
  _advisor_test.cs 5f 61 64 76 69 73 6f 72 5f 74 65 73 74 2e 63 73
  c..standard_advi 63 10 00 73 74 61 6e 64 61 72 64 5f 61 64 76 69
  sor...........?. 73 6f 72 01 00 00 00 00 00 00 00 00 00 80 3f 00
  Data from fail 2900:
  ..?...?.....9..p 00 00 3f 00 00 00 3f 00 00 98 d1 bd 39 10 00 70
  ortrait_minspec. 6f 72 74 72 61 69 74 5f 6d 69 6e 73 70 65 63 00
  ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 01 01 00
  ................ 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
Then all I need to do is look back from point of the crash to the last definitely correctly decoded part (in this case those two strings look perfectly fine). Then find where is the first definitely incorrectly decoded part (in this case 00 80 3f is clearly last 3 bytes of a float, so it was off by one at this point already).

Then I can adjust that specific data structure's method. I don't even need to guess what that extra data is. If I see five zeroes I don't have decoding for, I just tell the converter to expect five zero bytes.

Then if some other file has non-zeros at that position, I'll get nice exception like "Zero data expected, got 05 00 00 00 00", then I can pretty clearly see that first four bytes are an int32 - and the last remaining one is likely a boolean (but I'd still leave is as undecoded zero for now).

Debug mode

At some point I implemented a small modification to direct conversion process. There's debug flag to control printing of various extra information like structure offsets, hex values of ints and floats and so on.

Converter first converts binary to XML with debug flag off. If that process crashes - it turns debug flag on, and starts all over. This way normal XML isn't polluted by too much extra information useful only for debugging the converter, but in case of crash I get tons of extra information.

First three games

The first three games were easy enough. I already had a mostly working decoder, so I used it as a starting point, and used procedure described here to fix any issues.

Initially I thought about backporting fixes to the old converter, but I quickly gave up on this idea when I discovered just how extensive the changes would need to be.

In any case I got converter working far better than the old one without any major difficulty.

Next seven games

This is where my plan run into first problems. Starting from a working converter for version X and adding support for version X+1 is easy:
  • run conversion anyway, ignoring that version is wrong
  • identify where exactly it crashes (based on  <error> tags and my analysis tool)
  • try to fix those crashes, gated by some if @version >= x+1 checks
Unfortunately first three games used versions 25 to 54, then next seven games used version 74 to 129. So I had a 20 version gap with nothing in between, and really I looked like I'd need to decode from pretty much from scratch.

Cpecific's decoder

I'm sure I'd be able to figure out the decoding, but I found unexpected help. It turns out Cpecific wrote a PHP-based UI layout decoder. It doesn't actually convert anything - just prints JSON-style output describing contents of various UI layout files.

I tried to run it on a bunch of files, and it seemed to have 80%-ish support for newer 7 games, similar to how well old decoder supported the older 3 games.

The main weakness of Cpecific's decoder is that it doesn't actually convert anything - so you're expected to do hex editing, and then check in the decoder that results are what you expected. Not exactly an ideal workflow, but it super beats hex editing blind.

I also couldn't fully trust its decoding, and it crashed on many files, but it was definitely a huge help at crossing the gap between Version054 and Version074, and once I crossed it, it was easy going to do one version at a time.

I also used it to annotate some fields with comments on what they could likely be.

I don't plan to do any further development of old converter, but in case Cpecific wants to continue with his, at some point I should write down a list of issues I found and their fixes.

Warhammer III

A new Total War game is coming out soon, so the converter will likely need an update. I don't expect this to be difficult - Rome 2 was the last time they did major format update.