Quick Sort, A Fast Sorting Algorithm
Introduction_____
The quick sort algorithm, presented by C.A.R. Hoare, is used to sort an array and is recognized as one of the better sort algorithms. The other great sorting algorithm is radix sort, but it's used for small or medium-sized data. Almost every coder has implemented radix sort for his 3D-engine.
Quick sort is the best algorithm for big data, ASCII strings... Yep! Wait a moment, I need to sort strings! Then, quick sort is what you need.
This algorithm can be used for different things, but in this article I'll focus on the application that it has in BWT. You don't know yet what that is? If you want to know, visit some links from my homepage:
http://www.geocities.com/SiliconValley/Byte/6789/
However, I'll present you the basic algorithm, and how to extend it so that it fits your data. If anyone has a better implementation or algorithm, let me know.
Here I'll present C code, which isn't mine at all, and Asm code for PC, under Protected Mode. (You know, those big pointers.) And that's mine.
The theory behind the algorithm_____
What do we need? To sort a given set of data. And what do we primarily want? Speed. The main idea of qsort is "Split and you'll win", so we just treat the data every time in a smaller range. We get a number, and then compare both numbers from the right and the left, avoiding the numbers that are 'sorted', i.e. are greater or lower respectively than our first number. Then for the split the only thing we have to do is to call the qsort again, but this time with a smaller range, and so on till we have all the array, the data set, sorted. That's why qsort is recursive, it calls itself.
Qsort in C_____
Here is the code in C:
/* Just some data */
unsigned char Vektor[]; //the declaration of the array
/* The quick sort itself */
void QuickSort(int left, int right) //if we have 6 elements, then (0,5)
{
int i; //the pivot from the left
int j; //the pivot from the right
int x; //the number used to compare
i = left; //initalise both pivots
j = right;
x = Vektor[left+right / 2]; //and the number, with the byte in the
do //the main loop |-> half of the range
{
while ( Vektor[i] < x && i < right )
i++; //we avoid the numbers that are lower than x
while ( x < Vektor[j] && j > left )
j--; //the same, but with the greater numbers
if (i <= j) //if we are still in a valid range
{
SwapNumbers(i,j); //just swap both bytes
i++; //and update both pointers
j--;
}
}while (i <= j); //repeat if still in valid range
if (left < j) QuickSort(left,j);
//continue scanning in the left range
if (i < right) QuickSort(i,right);
//continue scanning in the right range
} //of course only if there's still data to scan
/* The function to swap two elements in the array */
void swap (a,b)
{
{
unsigned char aux; //help for swapping them
aux=Vektor[a]; //I think you don't need comments here
Vektor[a]=Vektor[b];
Vektor[b]=aux;
}
For checking it, just call it with quicksort(0,ARRAY_MAX-1); OK? Now go to the debug mode and understand it.
Qsort in Asm_____
Here is where I do the real job.
First we must see if we have enough registers for ALL the data. We will have good luck if we have enough. Here it is how we'll use them:
ESI - Base pointer, pointing to the start of the vector (the data to sort)
EAX - Index pointer, i
EBX - Index pointer, j
ECX - X, the data that was in the half of the vector
EDX - free, for making comparisons and as 'aux'
EBP - left
EDI - right
We can't use ESP 'coz we'll need it for the push'n'pops. Remember that this function is recursive, so we need to allocate the memory 'dynamically'. We cannot (and we shouldn't) use variables (like j, i, x...) for it. Well, let's start with the code:
qsort:
mov eax,ebp ;i=left initialise them
mov ebx,edi ;j=right
mov ecx,eax ;Get the X
add ecx,ebx
shr ecx,1
mov cl,[esi+ecx] ;x WARNING: it's a byte.
So now we have initialised the data and can start with the loop:
@@qs_inner:
;-> while ( Vektor[i] < x && i < right ) i++;
@@qs_while1:
mov dl,[esi+eax] ;vector[i]
cmp dl,cl ;vector[i] < x
jae short @@qs_no_1 ;jump if it's not true
cmp eax,edi ;i<right
jge short @@qs_no_1 ;WARNING: The dwords are SIGNED
;-> once there, we have to execute the body of the while
inc eax ;WARNING, add to EAX the
;difference from one element
;and the other
jmp short @@qs_while1 ;we are simulating a while
@@qs_no_1:
This was the first while. Note again that the dwords are signed. I needed two hours of debugging to notice that. In fact I was searching the bug in the rest of the function. ;-(
@@qs_while2:
;->while ( x < Vektor[j] && j > left ) j--;
mov dl,[esi+ebx] ;vector[j]
cmp cl,dl ;x < vector[j]
jae short @@qs_no_2 ;as the other
cmp ebx,ebp ;j > left
jle short @@qs_no_2
dec ebx ;--j
jmp short @@qs_while2
@@qs_no_2:
We have half of the code. And now the swap part comes:
;-> if (i <= j) swap()
cmp eax,ebx ;if (i <= j)
jg short @@qs_no_if1
;-> swap them
mov dl,[esi+eax] ;vector[i]
push dx
mov dl,[esi+ebx] ;vector[i]=vector[j]
mov [esi+eax],dl ;vector[i]
pop dx
mov [esi+ebx],dl ;vector[j]=vector[i]
;-> i++ j--
inc eax ;update both pointers,
dec ebx ;remember that we comp. bytes
@@qs_no_if1:
cmp eax,ebx ;let's repeat everything
jle short @@qs_inner ;if needed
It's not my intention to explain the whole algorithm, but note that we swap the registers when i=j. But then, you may ask, we are swapping the same element? Yes, we do that 'coz then the pointers begin updated, and we don't fall into an infinite loop. Now we must see if we have to recursive recall qsort, with different ranges:
;-> if(left < j) qsort(left,j) new range, the left one
cmp ebp,ebx ;(left<j)
jge short @@qs_no_if2
;'-> qsort(left,j)
push eax ;save ONLY the registers
push ebp ;that we'll need
push edi
mov edi,ebx ;right=j left=left
call qsort
pop edi
pop ebp
pop eax
@@qs_no_if2:
;'-> if(i < right) qsort(i,right)
;new range, the right one
cmp eax,edi ;(i < right)
jge short @@qs_no_if3
;we don't really need to push any register
mov ebp,eax ;left=i right=right
call qsort
@@qs_no_if3:
ret ;the end of the qsort
Ok, you've understood everything? I hope so, 'coz it's not very difficult.
Extending it_____
For example if you need to extend it to ints, the only thing you have to change is the way the comparations are done. Or if you have more complicated data structures, the sort algorithm is the same. Say we have this data:
<int><pointer>
When comparing we'll compare the <int>, and when needed to swap, we'll copy <int> and <pointer>. This is an idea for extending the qsort.
Sorting strings_____
Now let's do something we'll need, sorting strings. But we'll not directly sort strings, we'll sort pointers. The actual comparison will be done with the strings themselves, but we'll swap the pointers. As you see this is totally BWT focused. Did you ever have a look at the C qsort? It calls a function of your choice and returns a value depending if it was greater, lower... This is the way we'll do it, a function that compares strings.
Ok, now the values to compare will be strings (we'll now do the function) and the element to swap will be the pointers.
The conversion is easy, or at least it seems so. When incrementing i:
AFTER: inc eax NOW: add eax,4
That is 'coz the difference beetwen one element and another is 4 bytes. Remember that we are using PMode pointers (dd).
When copying:
AFTER: mov dl,[esi+eax] NOW: mov edx,[esi+eax]
And a warning about a bug I had (if your qsort hangs, then first check that): When choosing the middle element, we must first divide (shr) by 4 (2) and then multiply (shl) by 2 (1). Note that we CAN'T do shr 1!!
Example:
0Bh - shr2 -> 02h - shl1 -> 04h ;Ok
0Bh - shr1 -> 05h ;WRONG
Remember that the value x (ESC) will now store a POINTER. We can't compare it directly. Once you have implemented these little things, we can do the string comparison. We MUST scan the whole string if needed, we MUST stop at the end of the string, and we MUST return some value that tells when the string is greater, lower or equal. We must use the registers in a way that we don't have to push a lot of them and it can be easily called from qsort. Let's think...
ECX - pointer to the first string
EDX - pointer to the second string
EAX - used for the comparations
EBX - counter, so we'll neither increment ECX nor EDX
Then the code starts:
qs_comp_strings:
push ax ebx ;Push what we'll use
mov ebx,-1 ;to properly enter the loop
This is when we initialise the registers. Now the main loop:
@@qscs_more:
inc ebx ;next byte in the string
cmp ebx,qscs_string_max ;Don't comp. beyond the end
je short @@qscs_equal ;if it's equal, then exit
I have to say something: qscs_string_max is a VARIABLE, and it must be equal to string_lenght+1.
mov al,[ecx+ebx] ;get byte from the first
cmp al,[edx+ebx] ;compare with the second
je short @@qscs_more ;while equal repeat
Once there, they are different, so we check the flags and return.
ja short @@qscs_above
jb short @@qscs_below
Quite easy, isn't it? If it doesn't jump, they are equal:
@@qscs_equal:
xor edx,edx ;they are equal, return 0
jmp short @@qscs_end
And if they aren't equal:
@@qscs_above:
mov edx,1 ;if ECX is greater, then 1
jmp short @@qscs_end
@@qscs_below:
mov edx,-1 ;if ECX is lower, then -1
@@qscs_end:
pop ebx ax
ret
This is the string comparison function, quite easy. Now let's use it. I'll give you an example, and you will do the rest. I think I have put too much Asm code here. For example in the first comparison.
- Vektor[i] < x
- mov dl,[esi+eax]
cmp dl,cl
jae short @@qs_no_1
We do:
mov edx,[esi+eax] ;the pointer from [i]
call qs_comp_strings ;ecx already has the pointer
cmp edx,1 ;X must be above, so it should return 1
jne short @@qs_no_1 ;if not, then jump out of the while
OK? Then do your (own?) implementation.
Using it with BWT_____
You have 90% of the comparison function. Now you only have to care about keeping track of what was the number of the pointer first. I recommend you that you have in memory:
- pointer1 (dd) - position pointer1 (dd) - pointer2 - position pointer2
So when swapping, you just swap two consecutive dwords, and nothing else. Remember that in BWT, you don't do 'length' strings, you only have pointers. But what happens when you rotate them?
String: abc
Rotated: bca cab
Beyond the position of 'c' there's nothing else, so you may do two things:
1- Substract the string lenght from the pointer: (string rotated: cab)
You read 'c'
Then you are in the end, substract 3 from the pointer
Read 'a'
Read 'b'
2- Write the file to memory twice:
Real string in memory: abcabc
Read 'c'
End, no problem, the pointer is in the 'start' of the string
Read 'a'
Read 'b'
I had the second idea while thinking about qsort. It has some drawbacks. Well it has only ONE drawback: You will need twice the mem for the input file. I think this is not a real problem. For example, bzip has 500k as the maximum block size, so we'll only need 500k more, but nothing else, the rest of the data structures remains unchanged. And now the good things: It will make sorting FASTER, and implementing it will be easier. So, and having in mind that sorting a lot of strings is slow, I think it's a good solution. I'll use that in my own implementation of BWT.
Closing words_____
Now you should implement it. Well, in fact you only have to cut'n'paste the code, but I know you will try to understand the code - at least I hope so.
Now I must implement BWT. In the three last days, I've learned BWT, RLE, Mtf and qsort. I've implemented all of them but BWT. Now I'm tired of writing. So I better code more.
Don't forget to visit my home page for more articles and links at:
http://www.geocities.com/SiliconValley/Byte/6789/
If you have any doubt, email me at dario_phong@mixmail.com.