DAT Format (Monster Bash)

From ModdingWiki
Jump to: navigation, search
DAT Format (Monster Bash)
Format typeArchive
Max filesUnlimited
File Allocation Table (FAT)Embedded
Filenames?Yes, 30 chars
Metadata?None
Supports compression?Yes
Supports encryption?No
Supports subdirectories?No
Hidden data?No
Games

The DAT group format used by Monster Bash is for storing a number of related game files together in a single file (similar to how a .zip file is used.) The basic format has been known for some time, but the ability to decompress files was only obtained relatively recently when Gerald Lindsly (the Monster Bash lead programmer) released some example decompression code.

File format

Signature

The Monster Bash .DAT file has no signature, so the only known way of detecting whether it's a .DAT file or not is to implement some sanity checks (like ensuring the file entries don't contain file sizes that are larger than the .DAT file itself, or potentially more sensitive checks like assuming a compressed file will always be smaller than its size when uncompressed.)

File entry

A file entry consists of a header containing info about the file, followed by the file data itself, as shown by this structure:

Data type Description
UINT16LE iType File type
UINT16LE iFileSize Size of the file in bytes
char cFilename[31] Filename, NULL-terminated (not sure if it's exactly 31 chars long though.)
UINT16LE iDecompressedSize Size of the file once it has been decompressed, or zero if the file isn't compressed.
BYTE cData[iFileSize] File data

The very first file entry is at offset zero, and the above structure is repeated back-to-back until there is no more data in the .DAT file. If iDecompressedSize is zero, that particular file isn't compressed.

The filenames can contain paths (e.g. an example filename might be "digi\bark.voc")

Not all filenames have extensions, and the same name is often used multiple times with a different "type", as shown in this table:

iType Fake extension Description Format
0 .mif Map description Monster Bash Level Format
1 .mbg Map background layer Monster Bash Level Format
2 .mfg Map foreground layer Monster Bash Level Format
3 .tbg Background tiles Monster Bash Tileset Format
4 .tfg Foreground tiles Monster Bash Tileset Format
5 .tbn Bonus tiles Monster Bash Tileset Format
6 .sgl List of sprite filenames List of 31-byte null-padded fixed length strings
7 .msp Map sprite layer Monster Bash Level Format
8 - PC speaker sound effects (already has .snd extension) Inverse Frequency Sound format
12 .pbg Background tile properties Monster Bash Tileset Format
13 .pfg Foreground tile properties Monster Bash Tileset Format
14 .pal Palette EGA Palette
16 .pbn Bonus tile properties Monster Bash Tileset Format
32 - A normal file Various
64 .spr A sprite Monster Bash Sprite Format

The fake extensions are made up to make it easier to refer to files elsewhere. Camoto automatically applies the fake extensions when extracting files, and removes them (setting iType appropriately) when inserting files.

Compression format

The compression algorithm is LZW, with a few minor differences due to the particular implementation. The lead programmer, Gerald Lindsly, has released some decompression code on his web site which is able to fully restore extracted files (i.e. it also handles the RLE decoding mentioned below.)

LZW

If a file indicates so in the header, it is compressed using the LZW algorithm.

The first valid codeword is 0x101. Codeword 0x100 is reserved for reset and EOF. The codeword length is dynamic, starting at nine bits and increasing until it has reached 12 bits. Once 12-bit codewords are in use and the dictionary is full, the dictionary is not reset and compression/expansion continues without adding any new entries to the dictionary.

If codeword 0x100 is encountered in the middle of the data, the dictionary is reset and the codeword length is also reset to 9 bits. This case never seems to arise in the stock Monster Bash data, even though the decompression code can handle it. If codeword 0x100 is encountered at the very end of the data (according to the data size in the FAT) then it signals the end of the data. All compressed data must end with this codeword or the decompressor will get stuck in an infinite loop.

The codewords are split across byte boundaries in little-endian order, so a nine-bit codeword of 511 would be stored in two bytes as FF 01. (As opposed to big endian which would store it as FF 80.)

RLE

Once the data has been decompressed it is still encoded with a form of run-length encoding (RLE.) This needs to be undone before the original data is restored, and likewise needs to be applied before compression when inserting compressed files into the DAT (however as compression is optional, it may be easier to omit it altogether when modifying the DAT file.)

Byte 0x90 is used as the RLE indicator byte. When it is encountered, the previous byte (before 0x90) is repeated a specific number of times. The exact count is stored in the byte following the 0x90. For example:

 12 34 AB 90 04 56  ->  12 34 AB AB AB AB 56

Note that the original byte is included in the repetition count.

If the repetition count is zero, the 0x90 byte is treated as normal data:

 12 34 AB 90 00 56  ->  12 34 AB 90 56

When performing expansion, be sure to correctly track the 'previous byte' before the 0x90. The previous byte is not the previous byte in the input data, but the previous byte in the output data (after any expansions and escaping.) For example it is possible to have multiple expansions in a row, such as when there are more than 255 identical bytes in a row:

 12 34 AB 90 FF 90 04 -> 12 34 AB AB AB ...

In this example the 90 FF "sees" the AB byte just before the 90 byte, and expands it to 255 AB bytes total (the original AB plus another 254.) The following 90 04 "sees" the 255th AB byte from the previous expansion, and expands that to four AB bytes (the 255th AB byte plus another three.) This brings the total number of AB bytes to 258.

It is also possible to expand an escape sequence:

 12 34 90 00 90 05 56 -> 12 34 90 90 90 90 90 56

Here the 90 00 escape sequence is replaced with a single 90 in the output data, and so when the next 90 05 repeat sequence is read, the 'previous byte' to repeat is actually 90, because it is the last byte in the output data (not 00 which is the previous byte in the input data.)

Tools

The following tools are able to work with files in this format.

Name PlatformExtract files? Decompress on extract? Create new? Modify? Compress on insert? Access hidden data? Edit metadata? Notes
Camoto Linux/WindowsYesYesYesYesNoN/AN/A
Wombat WindowsYesYesNoNoNoN/AN/A

Credits

This file format was reverse engineered by Malvineous. The decompression code was posted on Gerald Lindsly's web site (the Monster Bash lead programmer.) Some of the file types were figured out by Szevvy. If you find this information helpful in a project you're working on, please give credit where credit is due. (A link back to this wiki would be nice too!)