EPF Format

From ModdingWiki
Jump to: navigation, search
EPF Format
Format typeArchive
Max files65,535
File Allocation Table (FAT)End
Filenames?Yes, 8.3
Metadata?None
Supports compression?Yes
Supports encryption?No
Supports subdirectories?No
Hidden data?Yes
Games

The EPF file format is used in many games, like:

Some (all?) of these games are made by East Point Software, so EPF (a shortened version of "EPFS" from the file header) probably stands for East Point File System or something similar.

File format

Signature

The first four bytes in the file are "EPFS", followed by a UINT32LE offset (so that value must be smaller than the total file size.)

Header

Data type Description
BYTE signature[4] "EPFS"
UINT32LE fatOffset Offset of the FAT
UINT8 unknown ! Unknown, always zero - flags?
UINT16LE numFiles Number of files

The data for the first file starts immediately after the header (i.e. file offset 11.)

File entry

At offset fatOffset sits a list of file entries. The following structure is repeated until the end of the file.

Data type Description
BYTE filename[13] Filename (NULL-terminated)
UINT8 compressionFlag 0 for "file is not compressed", 1 for "file is compressed"
UINT32LE compressedSize size of the compressed file
UINT32LE decompressedSize size of the file after decompression

In order to calculate the offset of each file, a running total is required. It should start at offset 11 (just after the header), then the compressedSize should be added to reveal the offset of the next file.

Compression

The compression scheme is LZW, with a dynamic bit length from 9 to 14.

The bits are stored in big-endian byte order, in contrast to e.g. Apogee/id LZW. For example:

 12          34         56         // bytes in compressed data file
 00010010    00110100   01010110   // converted to 8-bit binary
 000100100    011010001   010110   // bits read in big-endian order (Lion King)
000010010  100011010    010101     // bits read in little-endian order (id/Apogee)
^...go here         ^bits from here...

Although it uses a "normal" LZW algorithm, there are a few differences:

  • There are no reserved codewords at the beginning of the dictionary, so the first valid (9-bit) codeword is 0x100.
  • When the dictionary is reset, the bit length is unchanged.
  • The two largest possible codewords (at the current bit length) are reserved. The largest codeword is used to signify the end of the data, and the second largest codeword is used to reset the dictionary.
  • Once the third-largest codeword is encountered and processed (this is the largest valid/non-reserved codeword - 1021 for 10-bit codes) the bit length is incremented.

The reserved codewords deserve a little extra explanation. For a 10-bit codeword length, the largest possible value that can fit in ten bits is (1<<10)-1 == 1023, so a codeword of 1023 signifies the end of the data. A codeword of 1022 (one less than the maximum) will reset the dictionary, but leave the codeword length unchanged (10-bits in this example), although in reality it would not make sense to encounter a dictionary-reset codeword until the dictionary has reached its maximum size, i.e. 14-bit codewords are in use.

Keep in mind that these reserved codewords must be processed before the LZW decoder sees them, otherwise it will treat them as lookups into the dictionary, resulting in out of range accesses. Also remember that once the bit length increases, the two reserved codewords will change, as the maximum codeword value has increased. This frees up the "old" reserved codewords which are then used as normal codewords.

At least one EPF file (OVER.EPF from the game "Overdrive") contains corrupted files (4X43.MAP and 4X44.MAP), apparently due to a bug in the original compression algorithm when the files were created. No tools are able to extract these files, and the levels are unplayable in the game (which crashes with an error.)

Hidden data

Nothing requires the FAT to sit directly after the last file, and because the offset is specified in the header, it is possible to store the FAT much further past the end of the last file. This can be done to provide a block of data between the end of the last file and the start of the FAT which is effectively hidden, as it does not belong to any file in the archive.

While none of the official files seem to do this, it is nonetheless an interesting possibility. The Camoto tool makes this space available as a "comment" field.

Tools

The following tools are able to work with files in this format.

Name PlatformExtract files? Decompress on extract? Create new? Modify? Compress on insert? Access hidden data? Edit metadata? Notes
Camoto Linux/WindowsYesYesYesYesNoYesN/A