RLE compression

From ModdingWiki
Jump to: navigation, search
RLE compression
Format typeCompression algorithm
I/O unit size1-byte

RLE Compression is a form of lossless compression and one of the simplest ways of compressing and storing data. It is used, in various forms, by nearly every platform game. The essence of this compression method is to compress data by storing it as a collection of length/value pairs. To take a simple case, the string 'aaaaaaaabbb' could be compressed to '8a3b' (8 bytes of a, 3 bytes of b). The trick with RLE is to make sure the data does not end up becoming larger after processing (for example, 'abracadabra' could become '1a1b1r1a1c1a1d1a1b1r1a')

Most systems also tell the program beforehand the size of the decompressed data, stored as a word or dword at the start of the data block. This is usually done so that the correct amount of memory can be allocated before the data is processed. In modern programs this is an obvious security risk (it is trivial to write past the end of the allocated buffer) however this was not an issue at the time the games were written.

Length/value vs Flag Systems

There are two main systems of RLE divided by how length is differentiated from value.

In the first system, the first data is assumed to be a length, the next data(s) to be values. So using the simple system above the string '3a3b2c' would be read as 'Length 3 of a, length 3 of b...' Any individual piece of data may be a length or a value and can only be told apart by what type came before it. This system is simplest, but has the disadvantage that the data must be read from the start and that any error will ruin all data that follows it.

In the second system a 'flag' is used. This is a special value that tells a program to do something whenever it is encountered. So the string '*3abcd*3e' could be decompressed to 'aaabcdeee' where '*' is the flag value, indicating that a length/value pair is coming up next. Flags may indicate a number of things (Usually a value/length pair) and be words, bytes or bits (E.g all values over 128 are length bytes, all under value bytes.) This is more complex and is less efficient space-wise than the first system, but has the advantage that the data can usually be read from multiple points and an error may not corrupt all following data. (Since ever flag 'resets' the decompression.)


A literal is a piece of data that is uncompressed and should be copied directly to output. If RLE is used to compress something like text for example, there will be a large amount of data (Words) with no pattern that cannot readily be compressed and so would appear 'as-is' in the compressed data.

While simpler systems must encode everything as length/value pairs, flag systems especially assume that all data is literal and just copy it to output until they come across a flag. This saves a lot of space since incompressible data won't increase in size. This is why flag systems are the more common way of compressing data (Especially graphics.) and are more common in older games.

Compression units

RLE compression can occur at several levels, bits, bytes or words (RLEB, RLE, RLEW) Most common is at the byte level. The type used often depends on what data is being compressed. For example, most tilesets work on the byte level (1 pixel = 1 byte) and so will be compressed at the byte level, while most levels are made of two byte values for tiles and will be compressed at the word level.

RLEW compression has its own page which describes how it it commonly implemented.

Types of RLE