SkyRoads compression

From ModdingWiki
Jump to navigation Jump to search
SkyRoads compression
Format typeCompression algorithm
TypeStream
I/O unit size1-bit
Games

SkyRoads compresses some of its data files with this algorithm, based on LZSS.

Header

The compressed data starts with a short header.

Data type Name Description
UINT8 width1 Number of bits in the LZSS "count" fields
UINT8 width2 Number of bits in the short LZSS "dist" fields
UINT8 width3 Number of extra bits in the long LZSS "dist" fields

The compressed data follows. It is in big-endian order, meaning the first bit read comes from a byte's MSB (most significant bit), while each byte's LSB is the last bit read before moving on to the next byte.

Algorithm

  1. Read the three header bytes
  2. Read one bit
    • If it's 0:
      1. Read width2 bits, add 2 to the value, and use it as the LZSS dist value
      2. Read width1 bits, add 2 to the value, and use it as the LZSS count value
      3. Look back dist bytes in the output data, and copy count bytes to the end of the output data
    • Otherwise: (it's a 1)
      1. Read another single bit
      2. If this bit is 0:
        1. Read width3 bits, add (1 << width2) + 2 to the value, and use it as the LZSS dist value
        2. Read width1 bits, add 2 to the value, and use it as the LZSS count value
        3. Look back dist bytes in the output data, and copy count bytes to the end of the output data
      3. Otherwise: (it's a 1 again)
        • Read 8 bits, and write them as a byte on the end of the output data
  3. If there is more input data, go back to step 2

Pseudocode

width1 = getbyte()
width2 = getbyte()
width3 = getbyte()
index = 0
while (index < length) {
  if (getbits(1) == 0) {
    dist = 2 + getbits(width2)
    count = 2 + getbits(width1)
    copy count bytes from index - dist to index, add count to index
  }
  else if(getbits(1) == 0) {
    dist = 2 + (1 << width2) getbits(width3)
    count = 2 + getbits(width1)
    copy count bytes from index - dist to index, add count to index
  }
  else copy 8 bits from input to index, add 1 to index
}

MATLAB Source Code

Compression

%LZSS compressor
%compresses array of bytes using modified LZSS scheme

function cbytes = compressLZSS(bytes)

%define compression parameters
crossover = 2;
width1 = 5;
width2 = 8;
width3 = 10;

%calculate ranges of dist and count values
maxsmdist = bitshift(1, width2) + crossover - 1;
maxdist = bitshift(1, width3) + maxsmdist;
minlen = crossover;
maxlen = bitshift(1, width1) + crossover - 1;

%open temp file for writing
fid = fopen('temp.lzs', 'w+');

%write compression parameters
fwrite(fid, width1, 'ubit8');
fwrite(fid, width2, 'ubit8');
fwrite(fid, width3, 'ubit8');

%loop through input bytes
i = 1;
while i <= length(bytes)
    match = 0;
    if i > crossover
        %generate look-ahead buffer
        lookahead = bytes(i:min(i+maxlen-1, end));
        
        %generate scan buffer
        scan = bytes(max(i-maxdist+1, 1):i-1);
        
        %loop through all possible lengths, match longest first
        for len = min(maxlen, min(length(scan), length(lookahead))):-1:minlen
            %loop through scan buffer looking for a match
            complook = lookahead(1:min(len, end));
            for dist = len:min(maxdist, length(scan))
                %compare bytes "dist" back, length "len" to lookahead
                compscan = scan(end-dist+1:end-dist+len);
                
                if complook == compscan
                    %check if the distance can be coded with the small bit
                    if dist <= maxsmdist
                        %write signal 0 bit, followed by width2 bits of dist and width1 bits of len
                        fwrite(fid, 0, 'ubit1');
                        fwrite(fid, bitsbe(dist-crossover, width2), 'ubit1');
                        fwrite(fid, bitsbe(len-crossover, width1), 'ubit1');
%                         fprintf('Copy from small -%d, %d bytes\n', dist, len);
                    else
                        %write signal 10 bits, followed by width3 bits of dist and width1 bits of len
                        fwrite(fid, [1 0], 'ubit1');
                        fwrite(fid, bitsbe(dist-bitshift(1, width2)-crossover, width3), 'ubit1');
                        fwrite(fid, bitsbe(len-crossover, width1), 'ubit1');
%                         fprintf('Copy from large -%d, %d bytes\n', dist, len);
                    end
                    match = 1;
                    break
                end
            end
            if match
                break
            end
        end
    end
    
    %write the byte to the output if no match was made
    if ~match
        %write signal 11 bits, followed by byte of data
        fwrite(fid, [1 1], 'ubit1');
        fwrite(fid, bitsbe(bytes(i), 8), 'ubit1');
%         fprintf('Write byte %d: %d\n', i, bytes(i));
        i = i+1;
    else
        %don't write the next len # of bytes by incrementing loop variable
        i = i+len;
    end
end

%read back the file contents and store in cbytes
% frewind(fid);
% bits = fread(fid, inf, 'ubit1');
% fprintf('%d', (bits(1:end).'))
% fprintf('\n')
frewind(fid);
cbytes = fread(fid, inf, 'ubit8');

%flip the bits in the compressed section
cbytes(4:end) = bin2dec(fliplr(dec2bin(cbytes(4:end))));

%close temp file and delete
fclose(fid);
delete('temp.lzs');

end

%convert unsigned decimal value to BE bits
function bits = bitsbe(byte, n)

bits = bitget(byte, n:-1:1);

end

Decompression

%decompress LZSS bytes knowing resulting bytes should be length N
function dbytes = decompressLZSS(bytes, N)

%flip the bit order within the compressed section
bytes = [bytes(1:3); bin2dec(fliplr(dec2bin(bytes(4:end))))];

%save bits to temp file
ftemp = fopen('temp.lzs', 'w+');
fwrite(ftemp, bytes, 'ubit8');
frewind(ftemp)
fout = fopen('out.lzs', 'w+');

% bits = fread(ftemp, inf, 'ubit1');
% fprintf('%d', (bits(1:end).'))
% fprintf('\n')
% frewind(ftemp);

%read compression header bytes
width1 = fread(ftemp, 1, 'uint8');
width2 = fread(ftemp, 1, 'uint8');
width3 = fread(ftemp, 1, 'uint8');

%step through compressed bits and decompress
%stop once N bits have been written to output
while ftell(fout) < N
    b = fread(ftemp, 1, 'ubit1');
    if ~b
        %read width2 bits, add 2 -> dist
        dist = uintbe(fread(ftemp, width2, 'ubit1'))+2;
        %read width1 bits, add 2 -> count
        count = uintbe(fread(ftemp, width1, 'ubit1'))+2;
        %rewind dist bytes in output
        fseek(fout, -dist, 'eof');
        %copy count bytes from output to end
        copy = fread(fout, count, 'ubit8');
        fseek(fout, 0, 'eof');
        fwrite(fout, copy, 'ubit8');
%         fprintf('Copy from small -%d, %d bytes\n', dist, count);
    else
        b = fread(ftemp, 1, 'ubit1');
        if ~b
            %read width3 bits, add 2+1<<width2 -> dist
            dist = uintbe(fread(ftemp, width3, 'ubit1'))+bitshift(1, width2)+2;
            %read width1 bits, add 2 -> count
            count = uintbe(fread(ftemp, width1, 'ubit1'))+2;
            %rewind dist bytes in output
            fseek(fout, -dist, 'eof');
            %copy count bytes from output to end
            copy = fread(fout, count, 'ubit8');
            fseek(fout, 0, 'eof');
            fwrite(fout, copy, 'ubit8');
%             fprintf('Copy from large -%d, %d bytes\n', dist, count);
        else
            %read a byte and copy to end of output
            copy = uintbe(fread(ftemp, 8, 'ubit1'));
            fseek(fout, 0, 'eof');
            fwrite(fout, copy, 'ubit8');
%             fprintf('Read byte %d: %d\n', ftell(fout)-1, copy);
        end
    end
end

%rewind output and read it to get all bytes
frewind(fout);
dbytes = fread(fout, inf, 'uint8');

%close files and delete temps
fclose(ftemp);
fclose(fout);
delete('temp.lzs');
delete('out.lzs');

end

%convert BE bits to unsigned int
function i = uintbe(bits)

i = (2.^(length(bits)-1:-1:0))*bits(:);

end

Credits

This compression algorithm was reverse engineered by Tahg on the Xentax forum. The MATLAB code was submitted by skaterzero807. If you find this information helpful in a project you're working on, please give credit where credit is due. (A link back to this wiki would be nice too!)