Cowboys & Endians
Introduction
While trying my best to draw some decent quality graphics for Hugi 19 my paint program (Picture Publisher) crashed a number of times, and yep, it was at some crucial moment. It was one of those "I haven't saved for 2 hours.. I will just do this.. then save it to disk..." kind of user moments.
But why does software crash?
Blame the OS..
Okay, so this is an obvious and rather lame scapegoat, but sometimes the company who created the OS must take some of the blame. On many occasions the actual design is as much the problem as the programming/implemention of it. We all know the classic Bill Gates statement about 640Kb being enough for everyone, there was a demo some years back which used this to form the background of a nice 'twirl' effect (like a sink plug-hole). This is a good example of a design assumption. Another one is the old FAT file system which only allowed 8.3 filenames, and we still have the old disk partition size problem with some BIOSes...
The old legacy functions from previous incarnations of an OS is like a ball and chain around the legs of a new version. Windoze, just like the Intel 80x86 cpus, are held back in many ways because they must be backwards compatible (or 'almost' compatible in many cases) with their predecessors. A number of years ago I heard from an old electronics friend that Intel could have designed a cpu which was 2...5 times faster if they started again from scratch, but of course the business world would have 'gone ape' if they had to scrap all their software. So Intel continues to this day to support the 8086, 80186, 80286, 80386 etc.. etc..
It's the same problem with M1cro$oft, only their previous versions of the OS had bugs and design flaws which can make the FPDIV bug look like a sub-atomic straw particle hidden in a universe haystack. ;) Hackers are only too keen (along with the rest of us users) to point out the flaws and bugs in software. Some are minor little niggles, but more and more seem to be very large holes in the design and/or implementation. Look at the world of internet security and the blurb from a certain company.
Blame the coder...
The user's second reaction once a program crashes is to blame the coder and say rude things about his/her family... A user's first reaction is mostly along the lines of "You stoopid F***ing machine!! ARRGGGGGHHH!!"
But violence is a bad thing...
especially against a poor, innocent PC. It might feel good throwing your
monitor from a tenth-storey window but think of the cost of replacing it.
There is NO easy way to write fast, reliable code. No matter what language you choose, what IDE or RAD kits you use or what visual applications that you have been trained to use by a mystical programming guru during the past twenty plus years, there is NO guaranteed way to produce bug-free code!!
Blame the compiler...
No matter what the companies say, compilers, assemblers and linkers all contain bugs. Coders try to find the one with the least number of 'quirks'. Let's face it, it's pointless trying to write bug-free software if your compiler generates the wrong code instruction.
NASA and some of the more bug-sensitive companies use some really old compilers and linkers for their programs. Sometimes their development tools are decades old. Why? Well, those early compilers have been tested for years and years, all the bugs/quirks have already been found. In their case having a new patch on the internet in a week's time isn't good enough when you're flying at 10,000 feet!! They also use a number of different compilers from different companies to insure that the final system is as bug-free as humanly possible.
Blame the user...
This is another classic in the customer support handbook. "No, it's not a bug... but a 'quirk' or undocumented 'feature'..", then soon after this comes the "this has been fixed in our new, 50 dollar upgrade.. do you have a credit card to hand?" line.
Blame the configuration...
Yeah, it's another 'pass the buck' excuse. Some people will try to ignore the problem and are happy to let some poor user spend days or even weeks trying a billion combinations of drivers, settings and desktop color schemes (heheh.. so I made the last one up). This is good example of poor quality control and lazy beta-testing.
The bad old days of mystery devices (like S-VGA cards) are thankfully coming to an end, or it will once your OS has good, solid drivers.
Blame the PC hardware...
Now that new PCs are moving towards a completely programmable system, the possibilites of having an incorrectly set up BIOS and motherboard are in some ways improving and other ways getting worse. No longer do you have to open your case and messing around with jumper settings, either let the BIOS auto-config itself, or select the desired settings manually.
The bad news, because most BIOSes are now flashable to allow for 'an easy upgrade path' they are also open to virus attack and user stupidity. I speak about the latter from experience. I tried to upgrade my Dell BIOS which was 100% the correct BIOS, but after flashing successfully and rebooting my machine failed to work.. Boo.. Hoo.. Boo...Hoo... After more than 10 attempts to reboot.. nothing.. no beeps.. nothing!
Then I remembered reading about a jumper on the motherboard which can be used as an emergency re-init mode of the BIOS. It allows all the user settings to be zapped with the default ones.. anyway it worked. :) The PC rebooted with a working BIOS (Yahhooo..!!).
So if you are thinking about flashing your BIOS...
read your motherboard manual and make a note of any CONFIG jumpers!!
Blame yourself...
When you're coding you often take shortcuts, make assumptions about certain inputs. You assume that a file's integrity is still intact just because the magic signature at the start is correct, then go onto to accept the values in each field without clipping them against your program's limits. The reason why some of these techniques were used in the past was due to limited CPU speed and/or memory space. We all have megs of memory and more Hz than a S&M hedgehog during a combined pin-cushion and nail-gun accident (i.e. a lot ;)). So there should be no real reason why good error checking/clipping is not performed. Even when you know that procedure X will only be called with parameters of Y and Z it is still good practice to add some redundant error checking.
Why add redundant checks?
Well, suppose in the future you decide to modify it or reuse those procedures for something else, but forget about the assumed limits or vital parameters. In the case of a code library this could mean disaster and lots of flames in your InBox. So isn't it better to put some extra tests in your procedures now to save you hours of debugging later?
Those extra checks do waste a little coding time, space and CPU clocks...
...but a few clock-cycles here and there is 100000x faster than a reboot!
What's the solution?
Err.. that's a good question. Currently there is no way to write good, reliable code. Our only option is to have a clean design, follow the K.I.S.S. (Keep It Simple, Stoopid!) philosophy and do as much beta-testing on as many machines by as many users as possible. High level languages can be useful (and is almost mandatory in many huge projects/companies) but they just enable us to write bigger and bigger software. Most/all programs are based on a building-block scheme where there are clear layers of function, sub-functions and high/low-level procedures. We build on the knowledge and code which worked in the past to create more complex, intertwined progs.
But there is a hidden danger here. Coders are human (or semi-human at least) so they often fall into bad habits of coding, if something worked in the past then it should work in the future. But we all know how difficult predicting the future is, a bad or incomplete design could lead the disaster later on. Taking shortcuts is another example of 'legacy bugs'.
My own approach to coding is to keep everything as simple and short as possible. Rather than create a huge, pan-dimensional, multiplex routine which tries to do everything, I prefer small custom routines with as few inputs and outputs as possible. If tasks MUST be performed in a certain order then add some flags/checks to make sure they do... Also take a tip from those NASA people, and consider the worst possible scenario where EVERYTHING fails, memory-allocation, memory-release, open-file, close-file and failed searchs/replacements etc... and make sure you have correctly coded the proper error clean-up functions too. For example if you are reading from a file and you encounter an error, then remember to close the file and release any memory which was being used. If something has failed then remember to tell the user and if possible give them an option to retry the task again.
Theres an old programming phraze which sums up poor design and coding: Garbage in, garbage out. But enough about my productions. ;)
Happy beta-testing!