Inverse Frequency Sound format

From ModdingWiki
Jump to navigation Jump to search

The Inverse Frequency Sound format is the format used by Apogee/id Software in many of their games for the PC sounds. It is named because the bulk of the data is stored as 'inverse frequency' values (the higher the value, the lower the tone it produces.) Most commonly this is stored in a SOUNDS.xxx file, though it may also be stored internally in the game executable.

Later games, such as Wolfenstein 3-D use a modified form of this for their (less important) PC speaker and Adlib sounds.

File format

The file is divided into three sections, 16 bytes of header, a list of sound names and the actual sound data. The start of the sound data should thus be (count + 1) * 16 bytes from the file start.

Header

Data type Name Description
char[4] signature "SND" + terminating null. Indicates the start of a sound file, whereas "SPK" + $00 indicates the start of the older 'speaker' format, which has very few differences. "SSE" + terminating null in Space Pizza.
UINT16LE size Size of file
UINT16LE unknown Usually 0x003C, but doesn't appear to do anything.
UINT16LE count Number of sounds. For SPK files this is blank as the number of sounds is always 63.
BYTE[6] pad Nulls to pad structure up to 16 bytes.

Sound definitions

This structure is repeated once per sound.

Data type Name Description
UINT16LE offset Offset of the sound from the beginning of the file.
UINT8 priority Whether or not sound will be interrupted by another sound if said sound starts playing while the first is. Sounds can only be interrupted by sounds that have an equal or higher value of this. 255 is max, 0 is inadvisable.
UINT8 rate Defines the update rate of the timer generating the sound interrupt. Usually set to 8.
char[12] name Null-padded sound name

Sound data

Sound data is divided into words (UINT16LE), with the word value being inversely proportional to the sound frequency. The sound frequency in Hz can be calculated as follows:

frequency = 1193181 / value

These word values are written directly to PIT Channel 2. The PC Speaker is updated at a rate of usually 140 Hz, so each word value is around 1/140th of a second of tone. Most files contain a few seconds of sound. The value $FFFF signals the end of a sound and most values are between the range $0100-$5000. $0000 is silence and will cause the PC Speaker to be turned off.

Depending on the implementation, the update rate might differ. Catacomb will set the speed for each sound according to the sound's rate value (rates 0 and 1 are about 18.18 Hz, rate = 8 is 140 Hz). Hovertank 3-D is hard-coded to use a fixed rate of 140 Hz and ignore each sound's rate value. Duke Nukem uses a fixed rate of 144 Hz.

Later games use a similar format for PC sounds, where the sound data is in bytes with values of 0-255. Multiplying those byte values by 60 basically converts them to word values that can be written directly to PIT Channel 2. The loss of fine-tuning can probably be attributed to the lesser role of PC sounds in the AudioT Format, which has a totally different way of reading the sounds, and also contains AdLib sound effects and music.

Notes

Most games load the entire file into memory and ignore all values in the header. The sound names are ignored, too. Since the games use hard-coded sound numbers (usually a 1-based index), seeking to 16*soundnumber will get the sound data offset, priority and rate.

Due to a bug in the implementation, Duke Nukem uses the low byte of the sound's offset value as the priority for the sounds read from DUKE1-B.DN?.

Credits

This format has been reverse engineered many times, most often in the Commander Keen 1-3 fan community. Probably first by Anders Gavare. If you find this information helpful in a project you're working on, please give credit where credit is due. (A link back to this wiki would be nice too!)