Barmy Bitstream Bashing

TAD

This is kind of like an update to my old Hugi 15 article called (yeah, you've probably already guessed) "Optimising 'Getbits' part 1". I guess most seasoned 80x86 coders might want to skip this as because its not a ground breaking article and only contains one 'new' trick for the masking out of unwanted bits from the bitstream. I say 'new' to mean, 'it does not use an AND instruction for the masking'. Interested? then keep reading. ;)

Introduction

Firstly, a 'getbits' routine (or similarly named one) is used to read a varying number of bits quickly from a bitstream. It is often needed in decompression (or 'depacker') programs in order to read back Huffman prefix-codes or in .GIF picture loaders (damn those patents).

The code in here is mostly 32-bits, so you will have to re-code certain parts to get a pure 16-bit version.

The Bitstream

First, let's describe how the bitstream is stored. Sorry if this is boring, but it does help to explain things later on.

Bits are stored starting at Bit #0 and ending at Bit #7 of each byte. This is the same format as the .GIF bitstream and seems to be the best suited for 80x86 low-high data storage.

If we read the first 32-bits from the bitstream into EAX then it would look like this:

Extracting bits

Say we have already used the first 5 bits ('a' to 'e') of the bitstream and want to get the next 21 bits ('f' to 'z').

After loading EAX with 4 bitstream bytes, we can simply shift EAX right to remove the already 'seen' bits then mask out those extra, unwanted bits.

There are a number of ways to perform the masking. Here are some:

        (CL) = Number_Of_Bits_Wanted

method 1:
        mov     edx, 1
        shl     edx, cl
        dec     edx             ; mask = (1 << BitsWanted) - 1

method 2:
        sub     edx, edx
        bts     edx, ecx
        dec     edx             ; mask = (2 ^ BitsWanted) - 1

method 3:
        movzx   edx, cl
        mov     edx, table[edx*4] ; look-up mask from pre-built table

method 4:
        mov     ch, 32
        sub     ch, cl
        mov     cl, ch
        mov     edx, 0FFFFFFFFh
        shr     edx, cl         ; mask = 0FFFFFFFFh >> (32 - BitsWanted)

Advance vs. Random-access

After getting N bits from the bitstream we need to update the position. There are two ways to do this. The first is the most widely used one. It is the 'advance' or 'increment' method. This involves updating both the Start-Bit-Pos and the Start-Byte-Pointer variables.

The following code will allow upto the maximum of 32-bits to be extracted from the bitstream and returned in the EAX register.

        .DATA?
StartBitPos     db      ?

        .CODE
;
; (EAX) = Get (CH) bits from bitstream[DS:ESI]
;
GetBits32:
        mov     cl,[StartBitPos]
        mov     eax, [esi]      ; \ DL.EAX = read 32+8 bits
        mov     dl, [esi+4]     ; /
        shrd    eax, edx, cl    ; align with bit #0
        xchg    cl, ch          ; CL = BitsWanted,  CH=StartBitPos
        ;; create and use mask ;;
        mov     edx, 1
        shl     edx, cl
        dec     edx             ; mask = (1 << BitsWanted) - 1
        and     eax, edx        ; mask out unwanted bits
        ;;  update position ;;
        add     cl, ch          ; EndBit = StartBitsPos + BitsWanted
        movzx   edx, cl
        shr     edx, 3
        add     esi, edx        ; advance ESI by (EndBit DIV 8)
        and     cl, 7
        mov     [StartBitPos], cl
        ret

Now the second way. The 'random-access' method. Instead of updating the pointer ESI after each byte, we recalculate the StartByte and StartBitPos each time. It has the advantage of allowing random-access in the bitstream (and, as you will soon see, slightly shorter code :). Also, if your bitstream is in a static place then, you could hardcode the base address into the [EDX+nnnnnnnn] and free up a 32-bit register (namely ESI).

Again this method allows for up to a maximum of 32 bits.

        .DATA?
BitSeekPos      dd      ?               ; *** note: 32-bit variable ***

        .CODE
;
; (EAX) = get (CH) bits from bitstream[DS:ESI]
;
GetRandBits32:
        mov     edx, [BitSeekPos]
        mov     cl, 7
        and     cl, dl                  ; StartBitPos = (BitSeekPos MOD 8)
        shr     edx, 3                  ; StartOffs = (BitSeekPos DIV 8)
        mov     eax, [esi+edx]
        mov     dl, [esi+edx+4]
        shrd    eax, edx, cl
        xchg    ch, cl
        ;; create and use mask ;;
        sub     edx, edx
        bts     edx, ecx
        dec     edx                     ; mask = (2 ^ BitsWanted) - 1
        and     eax, edx                ; mask out unwanted bits
        ;; advanced position ;;
        add     cl, ch
        movzx   edx, cl
        add     [BitSeekPos], edx       ; += BitsWanted
        ret

"The King of the Bits"

Okay, you have waited long enough. Here is the 'new' method I promised many ASCII lines ago. This version is restricted to a maximum of 24-bits (due to the ROR masking method). The 24-bit limit should be enough for most LZ-style packers.

        .DATA?
BitSeekPos      dd      ?

        .CODE
;
; (EAX) = Get (CH) bits from bitstream[DS:ESI]
;
GetRandBits24:
        mov     edx, [BitSeekPos]       ;
        mov     cl, 7                   ;
        and     cl, dl                  ; StartBitPos = (BitSeekPos MOD 8)
        shr     edx, 3                  ; StartOffs = (BitSeekPos DIV 8)
        mov     eax, [esi+edx]          ; read 24+8 bits
        add     cl, ch                  ; LastBit = StartBitPos + BitsWanted
        ror     eax, cl                 ; # 1
        sub     cl, 32                  ; \ #2
        neg     cl                      ; /
        shr     eax, cl                 ; # 1 Mask & align with Bit #0  nice :)
        movzx   edx, ch                 ;
        add     [BitSeekPos], edx       ; advance by BitsWanted
        ret

There are two tricks in the above code snippet. Firstly it aligns the last bit in EAX with bit #31 then performs the SHR EAX, CL to both mask and align the LSB with bit #0 in EAX. This saves having to build the mask and then perform the AND operation. The only downside is having to calculate the correct SHR value (read on...).

The second trick is used to work out the SHR value. The formula is:

        (32 - (StartBitPos + BitsWanted))

At first glance this would appear to need a temporary register/variable to hold the 32, then subtract the StartBitPos + BitsWanted from it. But you don't. Check out this useful optimisation trick.

        A - B                   ; e.g.   20 - 5   = +15

can be done:

        - (B - A)               ;        (5 - 20) = -15
                                ; negate          = +15

More news at 11.

You can optimise the 'GetRandBits24' further by removing another instruction. It's to do with the 32-(StartBitPos+BitsWanted) part and the rather nice barrel-shifter (ah, you've gotta love it).

        mov     edx, [BitSeekPos]
        mov     cl, 7
        and     cl, dl
        shr     edx, 3
        mov     eax, [esi+edx]
        add     cl, ch
        ror     eax, cl
        neg     cl              ;** note: no "SUB CL,32" !!
        shr     eax, cl
        movzx   edx, ch
        add     [BitSeekPos], edx

Unless I've made a complete fool of myself (and that happens often) the built-in MOD 32 feature of the barrel-shifter removes the need for the SUB CL,32 (or a 'AND AL,31' instruction.

        sub     cl, 32          ; (A)
        neg     cl
equals:
        neg     cl              ; (B)
;**     and     cl, 31          ;; <<-- done by barrel-shifter =)

If you run through the numbers you will hopefully see that both give the same results, except for CL=0. In case (A) it will give CL=32 whereas case (B) will give CL=0. "HAHA, TAD you fool!!!" you might cheer. But remember, the barrel-shifter will turn that 32 into 0.

In any case, send me your flamez and call me 'Susan' if it isn't so.

Closing words.

Oh well.... that's all folks. Btw, it was called "King of the Bits" as a play on the phrase "King of the beasts". Which of course is a Lion. And the sound a Lion makes is "RO(a)R...."

I'm sorry. That was a poor joke even by my very low standard. ;)

Who says coders need sleep?

TAD