I agree that the x86 architecture is not all that it could be, but the adoption of the 8088 and 8086 CPUs by IBM, the then computer giant of the age, for their first PC really put Intel on the inside track, and Intel's continued success has been that its ramped up family of CPUs can run legacy code by sustaining the old architecture with very few modifications, just adding various extensions that do not effect the register and instruction extentions.
The original justifications for the x86 are probably all gone, but what locks us into this antiquitated design is the operating system, first DOS, then Windows. Linux has also focused on the x86 platform because it is the de facto standard. And of course the hardware and OS together define the environment where your existing applications and new development must live and work.
You could break away and find a new architecture and OS, new applications, new development tools, and start over if you like. Really, the only thing that is holding you back is what's available, what you are willing to put up with and do for yourself, and the very limited market space that you would be entering at that point.
I'm going to assume that most of you realize that is too great a journey to embark on, so like it or not, you are going to stay with the prevalent hardware and software combinations currently available. Which justifies the continuance of this discussion.
There are separate conventions for handling two types of data: Numeric and String, as well as aniother conventions for processing the contents of memory.
With numeric data, we read most significant bit or byte to least significant bit or byte from left to right. We do this even with decimal numbers. Thus, 1057 is read as one thousand and fifty-seven, not seven ones, five tens, and one thousand. That is a matter of convention. With words, any combination of letters and digits, and text in general, we follow two rules: First we attempt to read left-to-right, then we attempt to read from top-to-bottom. With column data, we read left-to-right, top-to-bottom, then left-to-right again. In moving through pages of text, we turn the pages from right to left. These are conventions adopted for most western languages, but the do vary in other languages.
According to these conventions, we would look at a 32-bit register as having its most significant bit, representing 2*31, situated at the left side of the register, and the least significant bit, representing 2*0, situated at the right side of the register. Numbering the corresponding bit positions, across, we would see:
3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 \ Powers of
1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 / 2
Bytes are 8-bit representations, and to represent the four possible bytes that
could be loaded into this register of 32 bits, we would see them organized like this:
|3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 | | \ Powers of
|1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0| / 2
Byte 4 Byte 3 Byte 2 Byte 1 Char. Rep.
If we were to express the first four letters of the alphabet in these fourt bytes,
we would have to show them this way, in order to be consistent with the numeric or byte representation:
| D | C | B | A |
Now this would seem backwards from the ABCD order of the left-to-right rule.
It is, but it is consistent with the major to minor rule, if the first byte is considered the minor byte. And that goes along with the idea that the least
byte occupies a lower address in memory than the next most significant byte.
To put this another way, original 8088 chip design read memory one byte at a time, and advanced through memory from a lower byte address to the next higher byte address. For a 16-byte memory, it read the low order byte first, so the low order byte always had the lower address. Reading the low order byte first simplified the process of perform arithmetic operations, and also made it easy to increment and decrement register or memory contents. So if you had
the whole range of capital letters in memory, it would appear in this order:
low address --> ABCDEFGHIJKLMNOPQRSTUVWXYZ <-- high address
If you then read the first four bytes into EAX, the second into EBX, then third
into ECX, and the fourth into EDX, this is the byte arrangement in those four
registers:
EAX: DCBA
EBX: HGFE
ECX: LKJI
EDX: PONM
It is still in the same sequence, but now looks backward in each register, because of the convention of the low order byte appearing on the right. If you
stored these registers back into memory, then you would see this:
low memory --> ABCDEFGHIJKLMNOP <-- high memory
Now let's examine the EAX register briefly. The EBX, ECX, and EDX registers would be arranged the same way:
| Upper 16-bit word | AH (8 bits) | AL ( 8 bits) |
| "D" | "C" | "B" | "A" |
As explained earlier, getting to the bytes that represent D and C requires rotating the register 16 bits to the left or right. then treating them as AH and
AL respectively.
The Carry flag performs another important function with regards to shift operations. First, when you perform a shift or rotate operation, the last bit move to the left or right is copied into the Carry bit in Flags. You have the option then to retain that bit and include it in some other operation, such as testing it with a JC or JNC branch instruction, or comibining it in a add or substract operation using ADC or SBB (Add with Carry or Subtract with Borrow).
Second, using rotate or shift (or even ADC) with other registers or memory, you can take the carry bit and merge it with the contents of that register or memory, effectively creating a long shift function that effects two or more registers or memory addresses. Thus, it is possible to perform quad operations within either a 32-bit or 16-bit processor. You can extend this basic capability to handle much larger integer types as well.
To use the carry bit effectively, you have to be aware of which operations change the state of the carry flag. There are times when you have to preserve the state of the carry flag before carrying out further operations. A JC or JNC
branch serves the purpose of remembering a prior state by the branch taken,
or you can use the ADC or SBB instructions to preserve the contents into a register or memory location, or you can attempt to save all the flag states before continuing what you are doing.
Handling Flags is simplified by two instructions:
LAHF, which stands for Load AH register from Flags, and
SAHF, which stands for Save AH into Flags. In the original 16-bit design, there were only seven flag bits involved.
You also have the option to save the flags onto the stack with PUSHF, and to return the saved flags from the stack with POPF.
One of the things you might want is an extensive help file on the Assembly instruction set. You can look for a file named
ASM.HLP, which I find quite useful. I'm not sure where I originally found mine, but I have it associated with the PureBasic product, so it might be on that web site (
www.purebasic.com).
My previous post, where I identified the flag bits in the Flags register, is somewhat expanded on by the information in the ASM.HLP file. There, the following breakdown is available:
|11|10|F|E|D|C|B|A|9|8|7|6|5|4|3|2|1|0|
| | | | | | | | | | | | | | | | | '--- CF Carry Flag
| | | | | | | | | | | | | | | | '--- 1
| | | | | | | | | | | | | | | '--- PF Parity Flag
| | | | | | | | | | | | | | '--- 0
| | | | | | | | | | | | | '--- AF Auxiliary Flag
| | | | | | | | | | | | '--- 0
| | | | | | | | | | | '--- ZF Zero Flag
| | | | | | | | | | '--- SF Sign Flag
| | | | | | | | | '--- TF Trap Flag (Single Step)
| | | | | | | | '--- IF Interrupt Flag
| | | | | | | '--- DF Direction Flag
| | | | | | '--- OF Overflow flag
| | | | '----- IOPL I/O Privilege Level (286+ only)
| | | '----- NT Nested Task Flag (286+ only)
| | '----- 0
| '----- RF Resume Flag (386+ only)
'------ VM Virtual Mode Flag (386+ only)
- see PUSHF POPF STI CLI STD CLD
One of the properties of the 286+ architecture is what is known as the Protected mode. What it really means is that a certain instruction has to be executed for 32-bit addressing and registers can be accessed, this protecting any existing 16-bit code and data from accidently being interpreted as a 32-bit
instruction. Setting the Protection mode just means switching on the 32-bit
capability. In the 286 design, they forgot to include an instruction to turn the
protection mode off. Once turned on, the only way to turn it off was to power off or reset the computer. Some people referred to the 286 as having half a
brain, or even being brain dead. This is an exaggeration, and the 386 was introduced to correct this deficiency and add some improvements, primarily an updated FPU (Floating Point Unit), which was slightly different than the original
FPU. The 286 has since been largely ignored.
The 486 came out that integrated the CPU and FPU together. However, the programming features introduced with the 386 have remained essentially the same in later designs. The key difference has in graphical extensions, high speed instruction caches, and execution pipelines that make the present design more efficient and much faster. Multiple processing cores are the present vogue, but it is the OS that decides how your program will be processed internally.
While the PowerBasic compilers put a few restrictions on you with regards to
programming in assembly language, it alleviates much of the headache that goes with writing assembly code from scratch. And since PowerBasic also creates a sandbox (a reasonably safe place) for your assembly code to run in, some of the lacks involved are reasonably nonintrusive, insignificant, and immaterial. You just need to adapt your coding style accordingly. If that does not satisfy you, you can use a tool like MASM32, which is capable ot generaging DLLs of assemble routines that can be called from PowerBasic (or other programming language of choice). PowerBasic also automatically provides you with the Protected mode access to 32-bit registers, memory, and extended instructions.