Gettin jiggy with the Mr. Bit biggy
This is another old Hugi article update inspired by my updated GetBits article. This time its the 'PutBits' routine that gets some 80x86 attention in terms of optimisation.
Introduction
As you all know already, 'putbits' writes a varying number of bits into a bitstream so they can be later read back out using a 'getbits' routine. In many ways writing bits into a bitstream can be easier than reading 'em back out, as you will soon see.
Bit-blast from the past.
Okay, here is a snippet of code from my old Hugi 16 article. You pass it the number of bits you want to write along with the bits themselves. It will only handle a maximum of 24-bits, but this should be enough for 'most' packers.
; (EAX) = value to write (the 'data')
; (CH) = number of bits to write
;
PutBits:
mov cl, [BitPos]
mov edi, [BitPtr]
shl eax, cl ; shift token to BitPos
or [edi], eax ; combine with Bitstream
add cl, ch ; BitPos + NumBits
movzx eax, cl
shr eax, 3 ; EAX = BitPos / 8
and cl, 7 ; BitPos MOD 8
add [BitPtr], eax ; Advance BitPtr by EAX bytes
mov [BitPos], cl
(double) words of warning.
Like I mentioned in that old article, the 'OR [EDI],EAX' assumes that the bitstream buffer has already been cleared (at init time). Also allow for an extra 3 bytes (or better still 4 bytes) at the end of a bitstream.
Anyway, here is some init/pre-clear code to deal with that possible problem.
.DATA?
BitPtr dd ?
BitPos db ?
bitstream db 16384 dup (?)
.CODE
Init:
mov [BitPtr], offset bitstream
mov [BitPos], 0
mov dword ptr [bitstream], 00000000h
ret
; (EAX) = value to write (the 'token')
; (CH) = number of bits to write
;
PutBits24:
mov cl, [BitPos]
mov edi, [BitPtr]
shl eax, cl ; shift token to BitPos
or [edi], eax ; combine with Bitstream
add cl, ch ; BitPos + NumBits
movzx eax, cl
shr eax, 3 ; EAX = BitPos / 8
*** mov dword ptr [edi+4], 0
and cl, 7 ; BitPos MOD 8
add [BitPtr], eax ; Advance BitPtr by EAX bytes
mov [BitPos], cl
*** = clear ahead 4 bytes
The 'Init' part clears the first dword of the bitstream, this allows upto 32-bits to be output before more of the buffer needs to be cleared. The rest of the buffer is pre-cleared inside the 'PutBits24' routine.
Of course it would be much quicker to clear the entire bitstream buffer at 'Init' time rather than inside the 'PutBits24' routine. I used this method to show an easy size-optimization method.
Random-access
Now we get to the fun part. The improved PutBits routine, which as you've already guessed it is based on random-access rather than incremental or advancing the address-pointer and bit-position variables.
.DATA?
BitSeekPos dd ?
bitstream db 16384 dup (?)
.CODE
Init:
mov [BitSeekPos], 00000000h
mov dword ptr [bitstream], 00000000h
ret
; (EAX) = value to write (the 'data')
; (CH) = number of bits to write
;
PutBits24:
mov edx, [BitSeekPos]
mov cl, 7
and cl, dl
shr edx, 3
*** mov dword ptr bitstream[edx+1], 0
or bitstream[edx], eax
movzx edx, ch
add [BitSeekPos], edx
ret
*** = clear ahead 4 bytes
Closing words.
Easy Huh?
An improvement to both the PutBits and GetBits routine would be to handle data in dword chunks rather than bytes. This would speed up access (no mis-aligned dword memory accessing) but would require slightly more code. Maybe some total 80x86 nutter might like to code a MMX or FPU version of PutBits or GetBits? hehe..
Just in case you don't understand Roman Numerals, 'Part IV' means the same as Part X-VI. Hope that helps ;)