It was reverse engineered from partial documentation by various people including your truly, but so far there's no complete documentation available anywhere. This post is meant to be such documentation.
Obvious data types
Most of data types used in ESFs are straightforward - almost everything is little endian unless noted otherwise, data sizes are standard powers of two, negative numbers are two's-complement, character set is Unicode, floating point numbers are IEEE 754 etc.
- int8, int16, int32, int64 - little-endian signed integers
- uint8, uint16, uint32, uint64 - little endian unsigned integers
- float32, float64 - single and double precission IEEE 754 floating point numbers
- char8 - ASCII character
- char16 - little-endian UTF-18 codepoint
- bool8 - boolean (00 false, 01 true, other values never observed)
Strings
There are two kinds of strings used in ESF files - ASCII strings (used mostly for internal identifiers), and Unicode strings (used mostly for displayable text).
Both are prefixed by character count as uint16, for ASCII strings (ca_ascii):
- uint16 count, char8[count] characters
For Unicode strings (ca_unicode):
- uint16 count, char16[count] codepoints
There are no observed instances of ASCII strings having high bit set - so it's not possible to tell if that's meant as ISO-Latin-1, UTF-8, an error, or something else.
There are no observed instances of Unicode planes other than Basic Multilingual Plane being used.
There are no observed instances of 0 character/codepoint being present in either of them.
Neither string format is ever zero-terminated.
Exotic encodings for numbers
Before I get to the format itself, there are a few exotic encodings used in ESFs.
int24be and uint24be are big-endian (that is reversed) 24 bit (3 byte) signed and unsigned integer. So sequence of bytes 01 00 00 means 65536.
uintvar is a variable length encoding for unsigned integers, with following rules:
result = 0; while(data[i] & 0x80) { result = (result << 7) | (data[i] & 0x7f); i += 1; }Or in diagrams:
- 0XXXXXXX → 0bXXXXXXX
- 1YYYYYYY 0XXXXXXX → 0bYYYYYYYXXXXXXX
- 1ZZZZZZZ 1YYYYYYY 0XXXXXXX → 0bZZZZZZZYYYYYYYXXXXXXX
- 1WWWWWWW 1ZZZZZZZ 1YYYYYYY 0XXXXXXX → 0bWWWWWWWZZZZZZZYYYYYYYXXXXXXX
- 1VVVVVVV 1WWWWWWW 1ZZZZZZZ 1YYYYYYY 0XXXXXXX → 0bVVVVVVVWWWWWWWZZZZZZZYYYYYYYXXXXXXX
- 0 is encoded as 00
- 1 is encoded as 01
- 127 is encoded as 7f
- 128 is encoded as 81 00
- 255 is encoded as 81 7f
Format variants
There are 4 variants of ESF format, usually referred to by their magic number:- ABCD - used in Empire Total War and Napoleon Total War
- ABCE - used in Empire Total War and Napoleon Total War
- ABCF - used in Shogun 2 Total War
- ABCA - used in Shogun 2 Total War
Header
ESF files start with the following header:- uint32 - magic number (0xABCD, 0xABCE, 0xABCF, 0xABCA)
- uint32 - 4 bytes, always zeros - not present in ABCD format
- uint32 - 4 bytes, look like Unix timestamp - not present in ABCD format
- uint32 - offset where footer starts
Footer
ESF format consists of nodes similar to XML tags. The first thing in the footer is a lookup table for names of these tags:- uint16 number of tag types
- ca_ascii name of tag 0
- ca_ascii name of tag 1
- ca_ascii name of tag 2
- ...
- 02 00 - size of tag name table as uint16
- 07 00 - size of string "kittens" as uint16
- 6b 69 74 74 65 6e 73 - ASCII encoding of "kittens"
- 06 00 - size of string "pandas" as uint16
- 70 61 6e 64 61 73 - ASCII encoding of "pandas"
- uint32 size of Unicode string lookup table
- ca_unicode string A
- uint32 index of string A
- ca_unicode string B
- uint32 index of string B
- ...
- uint32 size of ASCII string lookup table
- ca_ascii string A
- uint32 index of string A
- ca_ascii string B
- uint32 index of string B
- ...
Data nodes - numbers
Between header and footer there are data nodes - or to be more precise exactly one data node, which may contain nested nodes - very much like XML has one root element. Node type can be determined by its first byte. Numbers nodes have very simple correspondence between encoding and encoded data:- 01 bool8 - boolean
- 02 int8 - int8 (never observed in practice)
- 03 int16 - int16
- 04 int32 - int32
- 05 int64 - int64 (rarely observed in practice)
- 06 uint8 - uint8
- 07 uint16 - uint16
- 08 uint32 - uint32
- 09 uint64 - uint64 (rarely observed in practice)
- 0a float32 - float32
- 0b float64 - float64 (never observed in practice)
- 0c float32 float32 - XY coordinates
- 0d float32 float32 float32 - XYZ coordinates
- 10 uint16 - angle from 0 to almost-360 degrees
- 12 - boolean true
- 13 - boolean false
- 14 - uint32 zero
- 15 - uint32 one
- 16 uint8 - uint32
- 17 uint16 - uint32
- 18 uint24be - uint32
- 19 - int32 zero
- 1a int8 - int32
- 1b int16 - int32
- 1c int24be - int32
- 1d - float32 zero
String nodes
In formats ABCD/ABCE string nodes are encoded as follows:- 0e ca_unicode - unicode string node
- 0f ca_ascii - ASCII string node
- 0e uint32 - unicode string node
- 0f uint32 - ASCII string node
Array nodes
There are array nodes types corresponding to basic node types, their code is 40 + code of basic data type:- 41 boolean array
- 42 int8 array
- 43 int16 array
- etc.
- uint8 node-type-code (40..5f)
- uint32 offset of first byte after end of array (this is a weird way of encoding size)
- element 0
- element 1
- ...
- uint8 node-type-code (40..5f)
- uintvar number of bytes in the array
- element 0
- element 1
- ...
- [offset 0x6000] 48 - since 08 is node type code for uint32, 48 is node type code for array of uint32s
- [offset 0x6001..0x6004] 0c 60 00 00 - offset of first byte after end of array
- [offset 0x6005..0x6008] 64 00 00 00 - 100 encoded as uint32
- [offset 0x6009..0x600b] c8 00 00 00 - 200 encoded as uint32
- 48 - code for array of uint32s
- 08 - size of the array as uintvar
- 64 00 00 00 - 100 encoded as uint32
- c8 00 00 00 - 200 encoded as uint32
- 56 - since 16 is code of uint32-encoded-as-uint8
- 02 - size of the array as uintvar
- 64 - 100 encoded as uint8
- c8 - 200 encoded as uint8
- 57 - since 17 is code of uint32-encoded-as-uint16
- 06 - size of array as uintvar
- 00 00 - 0 encoded as uint16
- 01 00 - 1 encoded as uint16
- e8 03 - 1000 encoded as uint16
Record node
The most important node type in ESF is record type, vaguely corresponding to tags in XML, or non-basic object types in Java. The root of ESF files is always a record type, and since records can contain any number of nodes in them, they provide basic structure to ESF files. For ABCD/ABCE/ABCF format record nodes are encoded as follows:- uint8 80 - record node code
- uint16 tag name - it's index to table of tags in the footer
- uint8 version - version number - starts with 0, updated every time object format changes
- uint32 offset of first byte after end of record
- node 0
- node 1
- ...
- uint8 80
- uint16 tag name
- uint8 version
- uintvar size of data within record in bytes
- node 0
- node 1
- ...
- uint8 a0
- uint16 tag name
- uint8 version
- uintvar size of data within record in bytes
- node 0
- node 1
- ...
- 100vvvvt tttttttt - That's 4 bits for version number and 9 bits for node type.
- uintvar size of data within record in bytes
- node 0
- node 1
- ...
Array of records node
The last type is arrays of records. All records in the array are the same type and version. For ABCD/ABCE/ABCF format it's encoded as follows:- uint8 81 - array of records node code
- uint16 tag name - it's index to table of tags in the footer
- uint8 version - version number
- uint32 offset of first byte after end of array
- uint32 number of elements
- uint32 offset of first byte after end of record #0
- contents of record 0
- uint32 offset of first byte after end of record #1
- contents of record 1
- ...
- uint8 e0 - array of records node code
- uint16 tag name - it's index to table of tags in the footer
- uint8 version - version number
- uintvar size of array contents in bytes
- uintvar number of elements
- uintvar size of record 0 in bytes
- contents of record 0
- uintvar size of record 1 in bytes
- contents of record 1
- ...
- 110vvvvt tttttttt - That's 4 bits for version number and 9 bits for node type.
- uintvar size of array contents in bytes
- uintvar number of elements
- uintvar size of record 0 in bytes
- contents of record 0
- uintvar size of record 1 in bytes
- contents of record 1
- ...
Putting it all together
So to put it all together ESF files consist of:- uint32 - magic number (0xABCD, 0xABCE, 0xABCF, 0xABCA)
- uint32 - 4 bytes, always zeros - not present in ABCD format
- uint32 - 4 bytes, look like Unix timestamp - not present in ABCD format
- uint32 - offset where footer starts
- root node (and more nodes within it)
- array table of tag name
- hash table for Unicode string lookup
- hash table for ASCII string lookup
- optional and completely ignored zero padding
No comments:
Post a Comment