Softdisk Library Format
SLIB, or Softdisk LIBrary, compression is a container file format used by Softdisk software to compress various files used by their games, most notably the Commander Keen Dreams series of games, (Including Dangerous Dave 3 and Dangerous Dave 4)to store title images used at the beginning of each game. It was created in 1992 by Jim Row.
The data held in the file can be compressed in any one of three ways, uncompressed, LZW and LZH. The compression used is primitive and rather different from later or traditional versions of LZW\LZH. SLIB files were created by the program SOFTLIB.EXE and as such any game that uses this format contains various segments of code in common with SOFTLIB for the decompression of data.
There is a closely related format, the SHL or Softdisk Help Library format. SHL files contain only a single file. Their header is slightly different, it's file signature is 'CMP1' (CoMPression of 1 file) while that of SLIB files is SLIB. The veracity of both files can be confirmed by checking for a word of value 2 at offset 4 in the file. The actual files have been given a number of extensions; .CMP (CoMPressed), .SHL (Softdisk Help Library) or the game extension.
The SLIB file can roughly be broken into a number of parts; the header, which contains data about the various data chunks, and the data chunks themselves, each containing a single file. Each chunk also has a short header.
The file header is found only in SLIB files and is absent in SHL files, which are loaded into memory in their enitrety. The SLIB header allows individual data chunks to be loaded inhto memory seperately.
The SLIB header is a variable length header that contains information about how many chunks there are in a file as well as their location in the file and size. It is used by the game to load chunks into memory and by SOFLIB to extract compressed files.
The first part of the header is a fixed length of 8 bytes and allows the game to identify the file as SLIB and also the total length of the header (Which will be 30 * the value at offset 6 plus 8.) The second part is a series of chunk headers that hold information about what file is held in each chunk. (The last six bytes are repeated at the start of the data chunk.)
FILE HEADER: 0 CHAR fID Signature, 'SLIB' (Softdisk LIBrary) 4 UINT16LE Version Version number, always $0002 6 UINT16LE Chunks Number of data chunks in file 8 CHAR[30x] Chunk headers Chunk headers
IMAGE HEADERS: ? CHAR Name Name of compressed image (Max len 12 chars) padded with nuls +16 UINT32LE Dat st Start of image data in file (From start of first image chunk) +20 UINT32LE Dat end End of image data in file (From start of first image chunk) +24 UINT32LE iOriginalSize Decompressed data size +28 UINT16LE iCompression Compression used, 0 = none, 1 = LZW, 2 = LZH
The data chunk has a short header followed by the actual data itself. The format differs slightly between SLIB chunkks and SHL files. Both format must identify themselves to the game, and they do so in different ways.
Notice that in the case of SLIB chunks bytes 4-10 are identical to bytes 24-30 of that chunk's header.
SLIB CHUNK FORMAT ? CHAR cID Signature'CUNK' (Chunk UNKompressed size) +4 UINT32LE iOriginalSize Decompressed data size +8 UINT16LE iCompression Compression method +10 CHAR[?] Data Image data
SHL files are slightly more tricky; an uncompressed SHL file is simply raw data, compressed data must identify itself as such and so the header is slightly longer. It can be considered a combination of the chunk header and the SLIB header in the section above.
SHL FILE FORMAT 0 CHAR cID Signature'CMP1' (CoMPressed s1ingle file) 4 UINT16LE iCompression Compression method (as above) 6 UINT32LE iOriginalSize Decompressed data size 10 UINT32LE iCompressedSize Compressed data size 14 CHAR[?] Data Image data
As noted above there are three forms of compression. The first is simple enough, no compression at all, the data is simply stored in the file. (Actually increasing the size!) The other two are called 'LZW' and 'LZH' by the programmers but both differ from what are now standard implmentations of those formats.
A compression value of 1 means the chunk is LZW compressed. The format of the compression used is different from the more 'usual' implementation of LZW. Most LZW works by building a 'dictionary', but LZW is in essence just referring back to data that has already been read.
The core of the implementation is that if a sequence is encountered that has already been read then it is replaced with a pointer to it. There are three types of data, flag bytes, pointers and literals.
Flag bytes divide the datastream into segments of eight 'values' which can be either literals or pointers. Pointers are 2 bytes long, literals 1 byte. (Therefore there will be a flag byte every 8 to 16 bytes of data.) The value of each bit (In little endian) indicates whether a value will be a literal (1) or codeword (0) Thus a value of 199 (11000111 in binary) indicates three pointers, three literals and two pointers in that order. (Total of 13 bytes.)
Literals are sequences that have never been seen in the datastream before, they cannot be compressed and are thus the same in the compressed and decompressed datastreams. (If the data is text they become quite obvious.) Any string less than 3 bytes long that has not been read before or cannot be pointed to (See below) will be stored as literals.
Pointers are reference to data that has already been read. They are two bytes long, with the first 12 bits giving the location to read data from and the last 4 bits giving the length of data to read.
The lower nybble (4 bits) of the second pointer byte holds the length of repeat data to read minus three. (This makes sense, the shortest sequence it makes sense to code is three bytes which can be given the value 0.) It will be immediately apparent that the maximum length of repeated data that can be stored as a pointer is 18 bytes.
The high nybble of the second byte is multiplied by 16 then added to the first byte to give the location of the data to read in the 'sliding window' minus 19. (This is due to the way the decompression is set up in memory.)
The 'sliding window' in this case is the region of decompressed data that the compressed data can point to. It will be immediately obvious that the pointers can encode values between +-2048, or about 2KB. If the decompressed data is less than 2KB in size then zero is the start of the data, if it is larger than it is 2048 bytes from the end of the data. (This is the origin of the term 'sliding window'; it is a window of data that can be slid along the datastream as it gets bigger.)
It will be noted that it is probable that the compressed datastream will not be perfectly divisible by flag bytes. In this case the unused bits are set to 0. The decompressor stops when the decompressed data size is equal to the value given in the chunk header.
As a simple example the sentence 'I am Sam. Sam I am!' will be compressed to:
FF Flag byte, 8 literals follow 49 20 61 6D 20 53 61 6D 'I am Sam' as literals 2B Flag byte, 2L, P, L, P, L 2Blanks ($2B = 43 = 00101011) 2E 20 ' .' as literals F2 F0 Pointer, read 0 + 3 = 3 bytes from $FF2, or -14 + 19 = 5 in the data. This is 'Sam' 20 ' ' as literal ED F1 Pointer, read 1 + 3 = 4 bytes from $FED or -19 + 19 = 1 in the data. This is 'I am' 21 '!' as literal
LZH is a combination of LZW and Huffman Compression. In its simplest implementation it is two levels of compression, first the data is LZW compressed, then it is Huffman compressed. This is used for example in GAMEMAPS However, the compression used in this case seems to be similar to that used in Keen 1, pure LZW using a dictionary.
The format is far from elucidated, but a table in the executables involved can be used to convert data into the 8 and 9 bit codes used for perfectly incompressible data. (Once compression starts being used this changes, indicative that this table is used to build the initial LZW dictionary.)
It is also notable that it seems to use both 8 and 9 bit codes, which is very difficult to do without some table to indicate the length of the codes. This varaible codeword length may be why the compression is said to be 'huffman'
The Planet Strike source code release contains C code that handles LZH compression and decompression in JM_LZH.C. The code was written by Jim T. Row, who apparently also wrote the Softlib utility, so chances are it is the same implementation of LZH.
Soflib (Softdisk Library Creator) is a DOS program that can be used to create or extract files from SLIB files. It can work with both forms of compression used in SLIB files. It is notable that for some reason files shorter than 24 bytes often fail to be compressed correctly. (Soflib outputs a library with an empty chunk in it.) Soflib can be downloaded from the tools section of this page.
Data contained in libraries
Keen Dreams uses SLIB to compress its title screen and also comes with a number of LZH compressed .SHL files containing text. The title screen is in LBM Format It is notable that the game does not read most of the LBM chunks, focusing instead on the FORM, BMHD and BODY chunks. This is because while the compressed files were designed to be viewed and edited in a standalone program, the game did not need things such as the LBM palette.
Dangerous Dave 3 and 4 use an additional SLIB file to store their digital sound effects, which are seperate from their PC\adlib sounds
- SOFTLIB.EXE - original DOS program that can create and modify Softdisk Libraries
- TITLEBUILD - Windows program to turn a 320x200 bitmap into KDREAMS.CMP for Keen Dreams