Carmack compression

From ModdingWiki
Jump to: navigation, search
Carmack compression
Format typeCompression algorithm
TypeStream
I/O unit size1-byte
Games

Carmack compression is used in the GAMEMAPS file in a number of games to further shrink the levels down beyond what RLEW compression can achieve. Its basic idea is somewhat like LZ (Lempel-Ziv) compression in that it contains pointers back to previous data.

As in RLEW compression, the first word in the Carmack compressed data is the number of bytes (not words) in the decompressed data. This is typically the number of bytes in the compressed RLEW data, as Carmack compression is performed after RLEW compression.

The format specifies some values in words - these are 16-bits in length.

Carmack compression contains two types of references to previous data: near pointers and far pointers.

Near Pointers

Near pointers occupy three bytes in the compressed data. The first is the number of words in the referenced sequence, the second is the signal byte of xA7, and the third is the number of words to the start of the reference (counting backwards from the current location). As a concrete example, the three bytes x05 xA7 x0A mean 'repeat the 5 words starting 10 words ago'.

Notice that near pointers only let one refer to the last 255 words. To refer to sequences further back, one must use far pointers.

Far Pointers

Far pointers occupy four bytes in the compressed data. The first is, again, the number of words in the referenced sequence, the second is xA8, and the third and fourth are interpreted as a word - a 0-based pointer to the start of the reference in words, so the address must be multiplied by two to reach the correct byte location. As a concrete example, the four bytes x10 xA8 x01 x20 mean 'repeat the 16 words starting at word number 513'.

Words with a high byte of $A7 or $A8

Words whose high (second) byte is xA7 or xA8 would appear to be issue, as they would be confused with near or far pointers. These are handled by representing them as the three bytes: $00, $Ax $xx, this is recognized as an exception (Repeating zero words would make no sense.)