The Future of 80x86 and Coding?
Written by TAD
The day I wrote these lines was the deadline for Hugi articles, so I thought I would write something about 80x86 assembly language and PC coding in general, and along the way make some predictions (well, ideas really) about how the 80x86 and coding could develop in the near future. No doubt some of these daft ideas will be laughed at a few years from now (if that long... heh heh).
I'm not waving, I'm drowning... in the C
I am sure that over the past few years many other 80x86/low-level coders have felt like me, being a member of a dying breed... someone who still uses an assembler, and not a high-level compiler.
I have heard all the arguments before about how easy and "portable" the C language is, but I am still not convinced because I feel that 80x86 and assembly in general still has so much to offer to the "real" coder. By this I mean someone who wants complete control over the CPU and hardware to produce something truly new, rather than a collection of other people's library routines with a lame copyright message bolted to the start up routine.
I can imagine that thousands of irrate C programmers are jumping up and down producing some colourful expressions as they read this (grin).
There must be thousands or millions of other coders out there who have gone into a book shop to try and look for a 80x86 or 68000 book and saw nothing but hundreds of C++ guides, books and dictionaries. For every 1000 C++ books there is perhaps 1 80x86 book, if that! You only have to look in the newsgroups to see newbies trying to find assembly language tuts or example programs.
In the old days of 8-bit and 16-bit machines the situation seemed better. Perhaps this was due to the lack of any real horse-power in the CPU, so coders were forced into using Z80, 6502, 68000 or 8086 simply because C gave results which were sadly not up to the task at hand. Either the code was slow, or the final program was far too big.
So is the problem because we now have far too much CPU power and can't think of ways to utilise it?
Well, maybe.
Is Optimizing an option?
Sorry to harp on about the "good old days" so much, but we seem to be losing some of the basic skills and algorithms which the likes of Knuth, Bresenham and other coding-gods sweated blood and tears over all those years ago.
Ask any newbie how to divide one number by another, they will say use the floating point instructions, or the DIV/IDIV instructions... okay.
But ask them how to divide 128-bits by 64-bits or multiply 64-bits by 64-bits and they will look very puzzled. "It can't be done... the CPU can only handle 32-bits!"
It seems that newbies are missing out on some of the most important algorithms and techniques simply because they use high-level languages only, and so they miss out on the low-level tricks.
Optimizing is probably the area of coding which has changed the most. Many seem to be happy to just enable all the options and switches in their compiler, or put a higher spec on the application's packaging. These days more memory is thrown at the problem using greedy algorithms which require more and more memory for something as simple as a text editor.
The CPU problem.
Gone are the days of counting clock-cycles to discover if an algorithm can be made faster, now more complicated schemes and software profilers (not to mention CPU hardware registers) are used in order to save a few seconds here and there.
But one of the main problems of the PC is the fact that no two machines behave in the same way, even with the same clock speed. One instruction on an Intel CPU will give good results, but on a AMD or Cyrix it will give lousy results. Nowhere is this more clear than in the compact instructions like LOOP, STOSB or in the use of displacements instead of registers. On one type of CPU you should avoid these old instructions and instead go for larger, quicker ones, but on the other types the opposite is true.
So what happens? How does a coder produce good results for all these different CPUs?
Well, they can't.
The only sensible solution is to use some average instructions which give reasonable results on most types of CPU. So any advantages of using the optimization tricks of one CPU are lost. Even chip makers like Intel keep changing their minds about whether RISC or CISC instructions are better. So with each new CPU comes a completely new set of rules regarding optimizing (different instruction clock cycles, prefetch, pipelines etc.).
This is probably why C has become the main language, simply upgrade your compiler to take into account the lastest CPU and recompile the same old C source code.
This obviously is impossible to do with the 80x86 and other low-level languages. Just think about AGI stalls, prefetch and alignment and you will soon see that almost all of your code would need to be totally re-written to take advantage of the latest CPU design.
I don't know if compilers support options for different CPU makers like AMD and Cyrix, but they probably do, or at least have an add-on library. Again this is something which the humble 80x86 coder doesn't have.
Syntax (T)errors
I don't know the history of the C language, but it is no doubt much younger than assembly-language. As far as I know the basic format and syntax of 80x86, Z80, 6502 etc... haven't really changed from the 1940's with its
<label>: <mnemonic> <operand1>, <operand2> <comment>
format.
When learning to code for the very first time this must appear like a nightmare trying to remember the syntax of each instruction, the operand order, having to move items around in registers, etc., etc. I guess that's why most people are directed into C, Pascal and (worst of all) BASIC.
I personally don't mind BASIC that much (I learned programming using the ancient ZX81 BASIC). In "some" ways it looks neater than C does. I have seen some neat BASIC code and some really, really bad C source code.
Now that the floating-point instructions are as fast (if not faster) than integer ones, have more precision and the use of complex formulas in everyday programs are growing, the old syntax of assembly seems terrible. Another reason why high-level language are so appealing, their easy to read format of formulae and expressions.
The future of 80x86 and assemblers?
This all seems like I am shooting assembly language in the foot in favour of C and other high-level stuff. Well, no. I am a 80x86/68000 junkie. On my tombstone I don't want an inscription, but a hex-dump!
The future is not the death of assembly, but a redesign of syntax and instruction format. There is at least one program which I believe may give an good indication of the next generation of 80x86 and 68000 etc.
It is the Terse programming language. Well, actually it's only a pre-assembler and turns a C-like source code into 80x86 source code. At first this may sound very stupid. Why not just use C from the start? Well, because Terse is NOT a C language, but a new, (and in my opinion) more logical format for instructions, formulas and expressions. Also it greatly cuts down the amount of typing, and that is never a bad thing.
So instead of this:
MOV AL, [EBX]
you write something like this:
AL = [EBX] ;
And instead of:
MOV BX,[SI] ADD BX, 5 SUB DX,CX
use this:
BX = [SI] ; BX+5 ; DX-CX ;
I kinda like this much shorter format and the fact that more than one statement can be written on a single line. As I hope you can see it is more formulaic and closer to the normal way you use arithmetic.
I have only seen the brief examples on the http://www.terse.com web-site and a short description of it. If the sub-routine calls are handled more like functions instead of the usual 80x86 CALL + RET instructions and there is a clean way to pass and return values between calls in registers or variables, then this looks like a very interesting new way to code 80x86.
The advantages of having clean, easy to read source code not only makes coding quicker, but the source code will take up far less space in memory and on disk, which means quicker compile times and the chance to make some monster sized programs.
There are of course problems with the current Terse language, it being only a pre-assembler for a start and, as it seems, the lack of floating-point numbers. Another problem might be the sheer number of CPU instructions which need to be available to the programmer, but I think this can be overcome by using them as operators.
E.g.
eax = edx XOR ebx ;
would produce:
mov eax, edx xor eax, ebx
The current assembly-language format appears to be a reverse-polish type where the operators are written in the mnemonic column with the destination and source operands after it.
E.g.
add edx, 4
I know a large number of newbie coders find this difficult to understand as it should be read as "ADD 4 to edx". In this case the 68000 syntax looks more logical with its
add.b #4, d0
This will add the byte 4 to the d0 register.
This is one of the main reasons why I hope other coders (especially assembler/compiler makers) will take time to explore new, more logical format. If it saves coding time and is easy to read other people's code then it can't be bad, can it?
But why should we use other people's code? Why not write our own?
Well, the fact is that projects are going to get far, far, far bigger and coders will have to start working in huge teams, possibly with other coders from the four corners of the planet (it's started already!). Just look at the huge capacity of DVD and the growth in VR technology and you will realise that 64kb of code won't be enough.
Compile time.
Perhaps in the future source code will either be compiled just before being run on the target machine, or as it is being installed.
Why?
Well, most games are moving towards complete customization, allows people to edit and create new features. DOOM and Wolfenstein were of course some of the pioneers in this field, just take a look at the vast amount of WAD files and add-ons.
Another important reason would be optimization. As I have already said, each CPU and chip maker has its own tricks and pit-falls to optimization. One idea could be to run a series of tests as the program is being installed and then compile the source code using these features. It "could" help to reduce the optimization problems between Intel, AMD and Cyrix machines.
Hey, it's only an idea.
Closing words
I hope to look into the re-design on the 80x86 assembler format sometime in the near future. I hope other coders will consider this too. One things for sure... the current syntax from the 1940s and 1950s is pretty bad, even for those of us who have used it for years or decades.
Some will say, "Hey, why not just learn and use C++?" but they would be missing the point. I still want to code in 80x86, but have it in a more readible format, also because I am really lazy I want less typing.
I think it is time for something new to happen in assembly-language. Every other language has been redesigned, updated or has been invented from scratch so I feel it's about time for 80x86 to become 80x86++ or 8++ (grin).
Check out the Terse programming language site, it's not a perfect solution, but it is perhaps the first proto-type solution to a new 80x86 format.
No, there isn't a shareware or trial version. Unfortunately.
http://www.terse.com
Happy coding.
Regards
TAD #:o)