FIG - loading a GIF file

TAD/Hugi

Introduction

This article briefly describes the GIF file format, how the LZW decompression works and gives an overview of the supplied 80x86 source-code. Yep, for all you 80x86 people out there... here is finally some code to load a GIF!!

For such a widely used graphics file format the GIF it is still a mysterious Pandora's box of CompuServe Patents and lack of 80x86 friendly info. I see the question "How do I load a GIF?" very often in the various assembly newsgroups, so hopefully this will address the problem.

The ZIP files

No, not staring Gillian Anderson... The ZIP file contains source code for 32-bit Flat/P-mode. I've tried to keep it all simple and give plenty of comments were possible. The code was written for TASM and tested on both TASM 5.0 and MASM 6.11 (sorry, no N-ASM version).

	  GIF32.ASM	  - 32-bit source code for Flat/pmode
     GIF.INC	     - include file (contains GIF structures & equates)
     TINYHUGI.GIF    - 640x480x16bit piccy I drew resized to 160x100x8

I used the nice WDOSX dos extender by Michael Tippach for the "GIF32.ASM" 32-bit version of the GIF loader, you can find this at:

http://www.wuschel.demon.co.uk/index.html

You MAY need the DLINK.EXE program by Adam Seychell (depending on your TASM and possibly MASM version). This can be found in the DOS32B35.ZIP at:

http://www.programmersheaven.com/zone5/cat19/index.htm

The GIF file format

The supplied code was written to only load and decompress a single image from a GIF file, so it does NOT deal with all the other extension sub-blocks (like comments, graphic-control data, plain-text or application data). So for animated GIFs you will need to adapt the source code to suit your own needs.

This shouldn't be as bad as it first sounds. The code already skips past these sub-block extensions and does the core work of decompressing the LZW bitstream (which is THE hardest part, IMHO).

A basic GIF87a/GIF89a file looks something like this:

gifhdr	  - the GIF header (Date/Version & "FIG" signature ;)

giflsd	     - the LSD (Logical Screen Descriptor)
	     - the GCT (Global Color Table, ie. the palette)

gifxxx	     - the various extension blocks (AEX, CEX, PTE etc..)

giftid	     - the TID (The Image Descriptor)
	     - the LCT (Local Color Table, ie. one image's palette)
	     - the TBI (Table Based Image - LZW bitstream) 

As far as I know the extensions and TID/LCT/TBI sections can be freely repeated for GIF files with multiple images. The extensions supply timing information and a method to intergrate text information between images.

A loader overview

The GIF32 works like this:

1) Open the file

2) Read the gifhdr (header) and check the "G" "I" "F" signature)

3) Read the LSD (Logical Screen Descriptor) (you may wish to extract this screen size information from here so you can allocate/select the correct screen size & resolution)

4) Read the GCT (Global Color Table) if any (the GCT_FLAG in the [giflsd.LsdFlags] indicates if GCT is present)

5) Read the next byte and check for extension OR TID signature (02C hex) (you can skip the extension sub-blocks by sitting in a loop and looking for a BlockSize = 0, which marks the end of a section)

6) Read the TID (The Image Descriptor) (this begins with the 02C hex Separator and contains info about the LZW compressed image which follows any optional LCT)

7) Read the LCT (Local Color Table) (this is a local palette used for the next image(s) only, its in the same format as the GCT which is 8-bits per R,G,B component. The LCT_FLAG in the [giftid.TidFlags] denotes if a LCT is present.)

8) Read the LZW code size, then decompress the TBI data sub-blocks. (The first byte, the ImageBits code size, is used to init the LZW decompression code token values. The actual LZW bitstream is broken into sub-blocks of between 1 and 255 bytes, just like the extension sections. Each block is preceeded by a byte count, a 00 means the end of the bitstream.)

The LZW algorithm

Sorry, but I'll skip most of the explaination about how the LZW works (there is plenty of information available on the net), besides you already have some 80x86 source-code. ;)

Oh, okay... here is a rough decompressor..

	  ImageBits = 2 ... 8	  ; This is the minimum LZW code size 
			     ; (i.e. number of bits in image pixels) 

     phraze[4097] bytes      ; used to build up a reversed LZW phraze 
     suffix[4097] bytes      ; last byte of each phraze 
     prefix[4097] words      ; token of the sub-phraze before the suffix 

Init:
     FutureCode = 0
     OldCode = 0
     BytesLeft = 0

     CodeBits = ImageBits + 1
     TopSlot = 1 << CodeBits

     FlushCode = 1 << ImageBits
     EndCode = FlushCode + 1
     Slot = FlushCode + 2

     BytesLeft = 0
     BitPos = 8 		     ; any number > 7 !! 

Depack:
     Token = ReadGIFCode()	     ; get the next LZW token from the 
				     ; bitstream. 

     if Token = EndCode then done..  ; end of image ? 

     if Token = FlushCode then ...
		     CodeBits = ImageBits + 1
		     TopSlot = 1 << CodeBits
		     Slot = EndCode + 1

		     while Token = FlushCode
			     Token = GetGIFCode()
		     wend

		     if Token = EndCode then done ...

		     if Token >= Slot then Token = 0

		     OldCode = Token
		     FutureCode = Token
		     OutputByte(Token)

     else..

	     Code = Token

	     Rev = 4097

	     if Code >= Slot then ...
			     OldCode = Code
			     FutureCode = Code
			     Rev - 1
			     phraze[Rev] = Code

	     while Code > EndCode ...
		     Rev - 1
		     phraze[Rev] = suffix[Code]
		     Code = prefix[code]
	     wend

	     Rev - 1
	     phraze[Rev] = Code

	     if Slot < TopSlot then ...
		     FutureCode = Code
		     suffix[Slot] = Code
		     prefix[Slot] = OldCode
		     OldCode = Code
		     Slot + 1

	     if Slot >= TopSlot and CodeBits < 12 then ...
						     TopSlot << 1
						     CodeBits + 1

	     while Rev < 4097
		     OutputByte( phraze[Rev] )
		     Rev + 1
	     wend

     goto Depack 

That's the main decompression, the only thing missing is the "ReadGIFCode" routine which constructs a N-bit token code from the TBI bitstream.

The GIF Bitstream

The bitstream is stored in an Intel friendly low, high-byte style with bits stored in a bit-0 first, bit-7 last order. The actual bytes which make up the entire bitsteam is broken up into smaller blocks, each block contains between 1 and 255 bytes. Each block is preceeded by a length count, exactly like the other extension blocks. A length count = 0 means the end of the bitstream.

	   byte 1    byte 2    byte 3

     76543210  76543210  76543210
     hgfedcba  ponmlkji  --vutsrq    <-- 22 bits abcdefghijklmnopqrstuv 

The above 22 bits (3 bytes) could be stored in a block like this:

		    76543210
     byte 0    00000011      = 3     the length byte

     byte 1    hgfedcba 	     bits  0...7
     byte 2    ponmlkji 	     bits  8...15
     byte 3    --vutsrq 	     bits 16...21 

A LZW token consists of between 3 and 12 bits (depending on both the number of ImageBits and the number of phrazes stored in the dictionary). A "GetBits" routine has to deal with both constructing a token code from a number of bitstream bytes and also dealing with crossing the TBI block boundaries.

ReadGIFCode:

     if BitPos > 7 then ...
		     BitRack = ReadGIFByte()
		     BitPos = 0

     Token = BitRack >> BitPos
     GotBits = -BitPos + 8

     while GotBits < CodeBits
	     BitRack = ReadGIFByte()
	     Token = Token or (BitRack << GotBits)
	     GotBits + 8
     wend

     BitPos = 8 - (GotBits - CodeBits)
     Token = Token and (TopSlot - 1) 

The following code will read a bitstream byte and deals with the caching of a single TBI block (which is between 1 and 255 bytes in length).

ReadGIFByte:
	     if BytesLeft = 0 then ...
			     BytesLeft = ReadFileByte()
			     for n = 1 to BytesLeft
				     cache[n-1] = ReadFileByte()
			     next n
			     BytePos = 0

	     n = cache[BytePos]
	     BytePos + 1
	     return (n) 

Closing words

I think I've said enough here. You know where the source code is and you can always fire up your favourite search engine if you still need some more info about the LZW algorithm or the GIF87a/GIF89a specs...

Happy viewing...

TAD/Hugi