The EPF file format is used in many games, like:
Some (all?) of these games are made by East Point Software, so EPF (a shortened version of "EPFS" from the file header) probably stands for East Point File System or something similar.
The first four bytes in the file are "EPFS", followed by a UINT32LE offset (so that value must be smaller than the total file size.)
|UINT32LE fatOffset||Offset of the FAT|
|UINT8 unknown||Unknown, always zero - flags?|
|UINT16LE numFiles||Number of files|
The data for the first file starts immediately after the header (i.e. file offset 11.)
At offset fatOffset sits a list of file entries. The following structure is repeated until the end of the file.
|BYTE filename||Filename (NULL-terminated)|
|UINT8 compressionFlag||0 for "file is not compressed", 1 for "file is compressed"|
|UINT32LE compressedSize||size of the compressed file|
|UINT32LE decompressedSize||size of the file after decompression|
In order to calculate the offset of each file, a running total is required. It should start at offset 11 (just after the header), then the compressedSize should be added to reveal the offset of the next file.
The compression scheme is LZW, with a dynamic bit length from 9 to 14.
The bits are stored in big-endian byte order, in contrast to e.g. Apogee/id LZW. For example:
12 34 56 // bytes in compressed data file 00010010 00110100 01010110 // converted to 8-bit binary 000100100 011010001 010110 // bits read in big-endian order (Lion King) 000010010 100011010 010101 // bits read in little-endian order (id/Apogee) ^...go here ^bits from here...
Although it uses a "normal" LZW algorithm, there are a few differences:
- There are no reserved codewords at the beginning of the dictionary, so the first valid (9-bit) codeword is 0x100.
- When the dictionary is reset, the bit length is unchanged.
- The two largest possible codewords (at the current bit length) are reserved. The largest codeword is used to signify the end of the data, and the second largest codeword is used to reset the dictionary.
- Once the third-largest codeword is encountered and processed (this is the largest valid/non-reserved codeword - 1021 for 10-bit codes) the bit length is incremented.
The reserved codewords deserve a little extra explanation. For a 10-bit codeword length, the largest possible value that can fit in ten bits is (1<<10)-1 == 1023, so a codeword of 1023 signifies the end of the data. A codeword of 1022 (one less than the maximum) will reset the dictionary, but leave the codeword length unchanged (10-bits in this example), although in reality it would not make sense to encounter a dictionary-reset codeword until the dictionary has reached its maximum size, i.e. 14-bit codewords are in use.
Keep in mind that these reserved codewords must be processed before the LZW decoder sees them, otherwise it will treat them as lookups into the dictionary, resulting in out of range accesses. Also remember that once the bit length increases, the two reserved codewords will change, as the maximum codeword value has increased. This frees up the "old" reserved codewords which are then used as normal codewords.
At least one EPF file (OVER.EPF from the game "Overdrive") contains corrupted files (4X43.MAP and 4X44.MAP), apparently due to a bug in the original compression algorithm when the files were created. No tools are able to extract these files, and the levels are unplayable in the game (which crashes with an error.)
The following tools are able to work with files in this format.
|Name||Platform||Extract files?||Decompress on extract?||Create new?||Modify?||Compress on insert?||Access hidden data?||Edit metadata?|