F***ing Learn To Code Again

KB

While reading Hugi 13, I ran across many articles rumouring about whether the scene is dying, whether people become inactive, why they do so and how much innovation is lacking in demos.

But, despite all this, even the TECHNICAL quality of demos is becoming steadily worse. Lately, I downloaded tons of productions from demo.cat.hu, looked through all of them, and almost started to cry and think of whether it wouldn't be better if I quitted this mess calling itself "scene" nowadays.

What I had to see were first of all many demos which didn't even start, but crashed or failed for this and that reason (and I don't think a 72meg machine with GUS and SB and a Trio64V+ in it - yes, UniVBE is loaded - doesn't cover enough standards to have at least one of them supported) - and the rest was mainly reincarnation of Mode 13h with effects I had seen many times before and BETTER. So what's up? Have people entirely forgotten that apart from all that design-hype nowadays coding should still be kind of an art? Or does just nobody care anymore? I don't know.

The most astonishing thing about it is, that in former times, people were actually ABLE to produce elegant code and good-looking fast demos - and all this with PCs we would just laugh at today. And look at all those Amiga and C64 sceners still producing stuff which some of "us" PC guys 'n grrrls just envy. And do you know what? It's friggin' easy. Just CARE about what you code and don't lean back once your desired effect shows the first correctly looking still picture on your screen.

On the other hand, most of that 'leet underground knowledge how to code demos is lost. There is no tutorial at all how to code demo effects in a way that they look good, work on more than your machine and maybe run the same overall speed (not frame rate of course) independent of the watcher's CPU. And sadly, nobody seems to be able to find out how all this worked and still works (ok, maybe, as I said, people don't care).

So, I'll just state a set of rules or proposals now how to make better demos (or demos at all)...

--- HOW TO BECOME AN ELITE DEMO CODER ---
--- PART I - BASIC PRINCIPLES AND PRACTICE OF DEMO CODING ---

... and I'll start right with the thing no one of you seems to want to hear:

1. OPTIMIZE YOUR CODE

["Optimize? Are you NUTS? We have Pentium IIs, we don't need to optimize!"]

Yes, I admit it, if you have a P2 machine with at least 300MHz, completely losing the overview of what you do is a permanently imminent danger. Whatever mess you type in your favourite editor or IDE, the result will be most probably what is called "fucking fast". One good example for this is that thread about rotozoomers on news.scene.org's coders newsgroup where a guy posted a rotozoomer which was about 10 times slower than a rotozoomer (which is a 1992 effect, just by the way) should normally be. And you know what? He didn't even recognize. No, in fact, he considered the rotozoomer I sent him back (1 hour of coding, some not-so-well-optimized written-down ASM inner loops, rest C++, ran 70fps on a P60, instead of HIS 10fps) buggy - and another guy even dared to answer me things like "your ASM code will lead you nowhere". Nowhere.

In fact, "nowhere" is exactly where I want to be if THIS attitude towards DEMO CODING (let's repeat this: DEMO CODING) is the common one. We have so powerful machines nowadays, capable of displaying ten thousands of particles, thousands of triangles and an unmeasurable number of shadebobs per video frame (read: 70fps) and trying to actually DO something with this power instead of repeating the same old effects again and again (just worse each time) leads me or any other person doing this NOWHERE? Get a life.

Now, how does one write optimized code? Well, it's easy if you think of a few things.

* CHOOSE THE RIGHT LANGUAGE:

Demo effects are supposed to run fast (at least I hope so). So, the FIRST choice at least for your inner loops is of course Assembler. I don't say that 100% ASM is necessary at all, in fact it causes more problems than it solves, but a good trick is to count how often a part of code is called per calculated frame - and if this number exceeds 1000, better use ASM code for this. And mostly, those portions are small inner loops and don't take more than one or two screen pages of code, so it isn't that much work actually.

- Know How To Code In Assembler (or: what is the difference between hand- written and compiled code): Don't try to code your ASM routine by simply converting your C++ or whatever prototype line by line. No, try to use all the registers the CPU has, avoid variables, avoid jumps and restructure your loop in a way that it fits the CPU's structure perfectly (this includes rearranging all your data types etc., too). Because simple Pentium optimization doesn't help anything in most cases if the algorithm itself is not optimized (the P2 does all that pairing etc. by itself, so the speed gain as at about 0%) - apart from one thing: USING MEMORY IS BAD. Don't mess around with tables, 32bit image data and whatever, since we have that thing called "cache", simply calculating things is often faster.

- and Use The Right Language For The Rest: Do NOT bother with Borland Pascal, Visual Basic and all those other toys. Don't say "C++ looks too complicated for me", imperative programming languages are basically all the same, once you get used to the new syntax, it's just as easy as before. So: use C or C++ (for DOS, preferably Watcom, for Win, VC++) - and don't be too afraid of OOP, if you use it wisely and don't throw around with abstract classes or virtual functions, it isn't noticeably slower at all (just mind one thing: In demo coding, there is no thing like "code reusability" or that crap ;).

* DON'T CALCULATE THINGS YOU SHOULD ALREADY KNOW

This one is a bit tougher, let me just explain, look e.g. at this rectangle fill routine (a bad example, but I just want to show the principles):

   for (int y=y1; y<y2; y++)
     for (int x=x1; x<x2; x++)
       vidmem[320*y+x]=color;

In its innerloop, the CPU has to calculate vidmem+320*y each time, though this value NEVER changes (and in fact, it's at least 20 CPU cycles you waste per pixel). Why not using THIS version:

   whatever *vptr=vidmem+320*y1;
   for (int y=y1; y<y2; y++) {
     for (int x=x1; x<x2; x++)
       vptr[x1]=color;
     vptr+=320;
   }

Isn't this a BIT better? Well, in my opinion (and in the opinion of ever other 'leet demo coder of course) not really. Still, the CPU has to compare the x value with the right border every pixel. So (if we have a well optimizing compiler), the following version is again a bit faster:

   whatever *vptr=vidmem+320*y1+x1;
   int xwidth = x2-x1;
   for (int y=y2-y1; y; y--) {
     for (int x=xwidth; x; x--)
       vptr++=color;
     vptr+=320-xwidth;
   }

and if you remember the things I said above (considering an 8bit mode):

   char *vptr=vidmem+320*y1+x1;
   _asm {
     mov edi, [vptr]
     mov al,  [color]
     mov ebx, [x2]
     sub ebx, [x1]
     mov edx, [y2]
     sub edx, [y1]
     yloop:
       mov ecx,ebx
       rep stosb
       sub edi, ebx
       add edi, 320
       dec edx
     jnz yloop
   }

(Now, is this too long or too complicated or do you need too much effort for this? Or does this lead you nowhere?)

Notice that I didn't write EVERYTHING in ASM (it isn't necessary, as said) and that I got completely rid of the variables x,y and xwidth, as well as the whole inner loop, which I cut down to ONE ASM instruction. So, this version should be about ten times faster than the first C++ one (and all you ASM haters out there, tell me ONE compiler which would be able to optimize the C++ code THIS way).

Ok, "rep stosb" isn't the fastest command, there are means of making such routines still MUCH faster, but I don't want to go into that deep level of detail now.

* KNOW WHAT U WANT:

Don't try to code your routines as universally usable as possible, don't think of huge data structures, hundreds of abstraction layers and all this other crap CS students become flooded with during their studies - when it comes to reality (the thing we exist in), no demo effect code will ever be reused for other purposes. If you want to code a rotozoomer, don't code a texture mapper and think "hey, I can use the code in my engine, too" - you WON'T. So just code a rotozoomer which does what you want it to do (in this case, zoom and rotate ;) and optimize it. Your engine (which you won't use in your next demo and you DEFINITELY won't use in your upcoming game) will need a completely different approach for its texture mappers anyway (apart from the fact we have those nifty 3d cards in our computers. Hello scene.).

Many of you will now probably realize that this problem would not exist without optimization, as optimised routines aren't really open to changes if you want to reuse them and recognize you have to change them here and there. But this is (at least in my completely unhumble opinion) the fate of every "true" demo coder and this is the point where you HAVE to care and HAVE to put MUCH work into your productions. Or you can just stay what people like me call lame. It's up to you.

INCOMING MESSAGE: Hello Java coders. Thanks again for showing me that the definition of platform independence is that the code won't run at all, independent of the platform. EOT.

2. MAKE YOUR CODE RUN ON MORE THAN YOUR PC

["It doesn't work? That's strange, HERE it does... maybe you should buy a better PC!"]

A common practice is that demos nowadays run on exactly two computers:

1. The computer of the coder
2. The compo machine (needs to reboot after the end)

Sometimes the demo will even run on one or two other group members' PCs, but this is rather seldom. Some good examples of this were that Windows 3DFX demo at Evoke (which didn't even run on ANY machine in its "final" version) and of course "Perfect Drug" by Elitegroup which was a brilliant example of code and design, but is known to run on almost no PC people have, for whatever reason (I was in the lucky situation to watch it at Dominator/Elitegroup's PC once, but it never worked for me either).

Anyway, what causes this dilemma? There is only one answer for this - and it's an answer we know: LAZINESS.

You know Second Reality, don't you? Of course you do. And I can surely say you watched it, if not several times. How come that demo works on every PC from 386SX16 to the latest Xeon machines and nowadays' demos DON'T? Isn't there anything wrong with it?

Definitely there is. People code something which SEEMS to work on their own PC, call it "demo" and send it out to the world without having tested it on ONE other computer or even realizing that their hardware dependant DOS code is what it is called: hardware dependant.

So what are the reasons why a demo only works on specific machines? And what can be done about it?

* BUGZ 'N MEMORY LEAKZ:

Ever thought about that your code does not really work and have it working at your PC is pure LUCK? No? Then think about the following: Let's say your effect works correctly. At least it seems so. Most probably, you have allocated plenty of memory for your tables, textures, virtual screens and all that stuff. Now, what happens if some code writes beyond the boundaries of those memory chunks? If you're lucky, nothing - at least nothing noticeable. But in fact, with most compilers and operating systems, those things destroy vital information the OS needs for heap management. And guess what MAY happen if you then try to free that memory again or allocate more of it... right. Our (Smash Designs) Demo wasn't released at TP8 just because of THOSE bugs.

So, watch your steps. Or better, know what you do and what your code does. If you have the time for it, a VERY good idea is to redefine malloc(), free(), new, delete and whatever you use and make it monitor what happens and how much you alloc and free. And if you don't see the beautiful number "0" at the end of whatever you did, you can be sure there is something wrong. And if this doesn't suffice to track your code's quirks, set up a so-called "memwall" - modify your malloc/new routines so that they allocate some more mem, put the desired memory chunk in the middle of the bigger area and fill the rest with a special sequence of bytes - thus, another routine can simply test if this sequence is still intact and you instantly know WHICH memory boundaries get overwritten.

And if you then tracked down all those bugs and finally, everything works, you may realize that there is no such thing like occasional crashes ["Your Windows drivers seem to suck"].

* THE CHOICE OF THE OS:

Ok, I know, this question is widely regarded as a religious one. If someone sold weapons to the demo scene, the Gulf War would look like a bad joke compared to the carnage which would arise just because of this rather unimportant point. So, I'll just try to explain my thoughts from the viewpoint of a coder who desires that his demo will be seen by as many people as possible.

Today, there are three possibilities: DOS, Linux and Windows. Let's just discuss all of them.

DOS is the, still, most often used "Operating System" for demos. But sadly, it is also most often the cause for all the problems we have, as there are no drivers for your hardware and no standard API to access it. Therefore, each demo has to support different standards to work on more than the coder's PC. On the audio side, there are fortunately libraries like MIDAS (uhm no, forget that), USMP and IMS (which I'd highly recommend, despite of its bitch-ness), so there are only problems with newer PCI cards which most often don't work at all with DOS programs. On the video side, though, there is the VESA standard and almost no one manages to support it correctly (or rather no one wants to ["HERE it works..."]). Every card manufacturer brings his own quirks and bugs into the VESA standard - and the widespread UniVBE isn't any better concerning this point. Speaking of UniVBE: What the heck is the point in requiring VBE2.0 anyway? If you use virtual screens in the computer's main mem (which is a necessary thing anyway, I'll come to this point later), writing banked blit routines is no problem at all. It only takes some time. If you don't have this time, go play Quake again, you're not worth being here. But also if you support banked modes, the VESA standard offers MANY traps for you, like modes with other x-resolution than actual words per line - and if you want to support more than ONE mode (which is quite better, do you REALLY know every graphics card supports the mode you've chosen?), the REAL problems just have begun.

To come to a conclusion, as cool DOS is regarding things like stable timers and enough CPU time, the hardware and API chaos is just a mess. It takes MONTHS to get an universal VBE code working on, let's say, 90% of all PCs. And in the actual era of PCI sound cards, you can be sure that enough people won't be able to enjoy the cool music of your demo at all (or in rare cases in more than kewl 22khz 8bit super-duper hifi sound).

Ok, Linux could be the answer. A free (and therefore extremely scene-friendly) OS, stable, great multitasking and well optimizing compilers... if there were a decent standard to get your demo onto the screen. Admit it, nobody wants demos running in a small X11 window (preferably only in exactly that color depth you never use) - and svgalib only works with, uhrm, nothing, and needs to be SUID 0. Hooray. There are things like libggi etc., but as long as there is no decent standard means of accessing the graphics card's frame buffer, Linux isn't more than inacceptable as a demo OS (not to speak of ["you need libvgagl 100.14, kernel 2.7.444pre3, X11R6.27, TCL8.4++ and to be precise a complete sunsite image taken on December 12th, 2014AD, 5:37pm to run this demo"]).

So, as sad as it is, and as much as it hurts to admit it, today the only REAL choice for demos is Windows. At least, it is the only OS which provides a rather nice, fast, standardized and WORKING way of bringing your code to the screen and the speakers - and this is DirectX. If you just set up a DSound primary buffer and a DirectDraw primary surface, you have exactly what you had under DOS with VESA and your sound drivers. And all that setup code, fiddling with COM interfaces, messaging etc. is about two days of work - and then you have your wrapper, your FlipScreen() and everything goes on as usual, with a BIG difference: Your demo will most probably work on MORE than your hardware. Ok, there are still some quirks and inconsistencies in the DirectX API, but the problems you encounter when you want your code to run on another PC are very small compared to those you get when trying the same under DOS.

So, Windows is unstable, bad and EVIL - but for demo coding, it's at the moment the only choice. And once the hurt stops, think about all those nice image and sound codecs and other libraries you have in a standard Win95/98/NT which happily do all those things for you which were such a torture to code under DOS. With the right wrappers (max one week of work), you can finally concentrate on what you really want - coding effects.

* TEST DA THANG OUT:

Yes, you are one of those persons who finish a demo three minutes before the compo deadline and don't have the time to test it on any other PC... Come on. Don't say you coded your whole VBE code five minutes before the deadline and did all the effects at home without actually seeing them... Ok, maybe this explains how ugly some demos are, but every coder should have and in fact HAS enough time to test his production on other computers. And "other" computers are not only your Dad's one, but also those of your real-life and scene friends, your girlfriend's brother and (yes) e.g. random people on IRC. Also, small scene meetings (do you still have them?) are an IDEAL place to test out your code and talk to others if it doesn't work and you don't know why.

(Hm, maybe I should consider wearing one of those neat "Hello - are YOU the PC scene?" shirts at parties, too.)

To come to the end, I'll finally ask you one question: Is all this really too hard or too much work? What's the problem with optimizing code or making your demo run on other computers than yours? Face it - it's only your own laziness. Nothing of all this is impossible, many groups showed us that it works. So why doesn't it work on you? Don't you feel ashamed? You better do.

If I ever write a next issue of this article, its content will most probably be "how to make your demo look good", covering topics from how to query the escape key ["escape key? Isn't the reset key enough?"] to DOS related ones like how to synch your effects to the retrace ["you mean, those stripes aren't necessary?"] and maybe some Win95 coding issues. We'll see.

- Tammo 'KB' 'Ja, ich BIN arrogant' Hinrichs