Author: Greg Wolffe and Zack Kurmas, with some modifications by Andrew Kalafut
Objective: The purpose of this lab is to explore an instruction set and machine language that is different from MIPS. In particular, we will be exploring the Intel 64/IA-32 machine language. As you work through this lab, pay attention to the differences between MIPS and IA-32 including instruction length, addressing modes, and the variety of instructions included in the instruction sets.
Deliverables: Submit in hardcopy a lab report that includes answers to the numbered questions.
This lab should preferably be done in groups of 2. If there are sufficient lab computers available, you may work alone.
The easiest way to write correct assembly code is to let a compiler do
it for us! Then we just have to figure out what it's doing and
why. A quick look at the man pages for the
compiler under Linux shows that the '
switch directs the compiler to generate assembly and stop (without
producing an executable). We can use this feature to learn the
nature of a particular instruction set by writing simple,
understandable programs in a high-level language (like C) and studying
what the compiler produces.
A brief note about assembler notation in the interest of making the programs easier to read:
(). For example,
movl %esp, %ebpmeans to take the value of
%espand put it in
%ebp. In contrast,
movl (%esp), %ebpmeans to take the value at the top of the stack and put it in
As an example, begin with the program exampleIML-1b.c.
gcc -S exampleIML-1b.c to produce assembly code for
the native machine. (I compiled this code on a arch01, a 64-bit AMD machine.
If you use a different machine, your code may look different ---
especially if the machine is a 32-bit machine. Assembly code from
32-bit machines won't have any instructions that end with "q".)
Look at the resulting code.
subq) set up the stack and frame pointers.
printf_answer. Notice that variable names do not appear in the assembly code. Instead, each local variable is assigned to a memory location referenced as an offset from the frame pointer.
movl $.LC0, %eaxset up the parameters, call
printf, and save the return value. Notice how the parameters are passed in spare registers.
leaveinstruction restores the stack and frame pointers , and the method ends.
exampleIML-1b.sdoes and why. (A couple of the "whys" aren't obvious, so don't hesitate to ask for help.) Note that the syntax used here for arithmatic is the opposite of that used in class. The second register is both a source and a destination.
In comparison to MIPS, the Intel machine language is extremely complex: Instructions can be anywhere from 1 to 15 bytes long; and, each instruction supports many different addressing modes. Fortunately, most of the extremely complex instructions are very specialized and rarely used (e.g., the MMX instructions). The instructions you will examine today are much simpler.
Figure 2-1 on page 2-1 of Volume 2A of the Software Developer's Manual shows the basic format of an Intel instruction. Some instructions have a prefix of up to four bytes. The next 1 to 3 bytes contain the op code. The instructions we will examine all have one byte opcodes. The byte after the opcode describes the operands and the addressing modes of the operands. As shown in Figure 2-1, this byte is divided into three fields:
Modfield applies to the parameter in memory.
ebx, etc.) or registers that are used with the addressing mode specified by bits 6 and 7. Note that as shown in table 2-2 on page 2-5, this is not a simple selection of one of the 8 general purpose registsters. Some values of this byte may specify diffrent registers depending on the addressing mode.
Look at Table 2-2 on page 2-5. The y-axis lists the possible values for
an instruction's "memory" operand (the operand that may access memory).
The brackets signify a memory access. For example
means that the operand is the data in the memory location whose address
is stored in the register
. In contrast,
identifies a simple, register-direct access to
. The x-axis lists the registers that can serve as the second operand.
Thus, according to this table, an instruction whose first operand is
, and whose second operand is
would have a ModR/M byte of
0x10. Notice that some R/M bits
0x8C, etc.) indicate that the operands are listed in a second addressing bit
. Table 2-3 lists the meaning of values in the SIB.
Now, let's look at some real Intel machine code:
exampleIML-1b.cdown to assembly code.
exampleIML-1b.swith the debug flag (i.e.,
gcc -g exampleIML-1b.s -o ex1).
gdbon your compiled file (e..g,
gdb ex1). (
gdbis the GNU debugger.)
disassemble main. You should see output that looks something the sample below. (If it looks drastically different, let me know.) The first column lists the address of each instruction in
main. The second column lists the address of the instruction relative to the beginning of main. The third and fourth columns contain the assembly instruction. Notice that the instruction
movq %rsp, %rbphas been replaced with a simple
(7) 0x00000000004004c4 <+0>: push %rbp (6) 0x00000000004004c5 <+1>: mov %rsp,%rbp 0x00000000004004c8 <+4>: push %rbx (ex) 0x00000000004004c9 <+5>: sub $0x18,%rsp (ex) 0x00000000004004cd <+9>: movl $0x52c,-0x20(%rbp) (1) 0x00000000004004d4 <+16>: movl $0x1619,-0x1c(%rbp) 0x00000000004004db <+23>: movl $0x2694,-0x18(%rbp) 0x00000000004004e2 <+30>: movl $0x8ad,-0x14(%rbp) (2) 0x00000000004004e9 <+37>: mov -0x1c(%rbp),%eax 0x00000000004004ec <+40>: mov -0x20(%rbp),%edx 0x00000000004004ef <+43>: mov %edx,%ecx (ex) 0x00000000004004f1 <+45>: sub %eax,%ecx 0x00000000004004f3 <+47>: mov %ecx,%eax (3) 0x00000000004004f5 <+49>: mov %eax,-0x18(%rbp) 0x00000000004004f8 <+52>: mov $0x400628,%eax 0x00000000004004fd <+57>: mov -0x18(%rbp),%ecx 0x0000000000400500 <+60>: mov -0x1c(%rbp),%edx 0x0000000000400503 <+63>: mov -0x20(%rbp),%ebx 0x0000000000400506 <+66>: mov %ebx,%esi 0x0000000000400508 <+68>: mov %rax,%rdi (4) 0x000000000040050b <+71>: mov $0x0,%eax 0x0000000000400510 <+76>: callq 0x4003b8 <printf@plt> 0x0000000000400515 <+81>: mov %eax,-0x14(%rbp) 0x0000000000400518 <+84>: mov -0x18(%rbp),%eax 0x000000000040051b <+87>: add $0x18,%rsp (8) 0x000000000040051f <+91>: pop %rbx (5) 0x0000000000400520 <+92>: leaveq 0x0000000000400521 <+93>: retq End of assembler dump.
x main+45to look at the machine code for the twelfth instruction (
sub %eax, %ecx). Bytes with lower addresses are displayed to the right; therefore, the instructions will look as if they are printed "backwards". In this example, the first byte of the
0x29; the second is
0xc1. The remaining two bytes are part of the next instruction. If you look on page 4-424 in the Developer's guide, you will see that
0x29is one of many op-codes for the
subinstruction. If you look in Table 2-2, you will see that a ModR/M byte of
0xc1indicates that parameters are registers
x/2 main+9. The
gdbto print two four-byte words. (We need to print two words because the instruction is 7 bytes long.) The words are displayed left-to-right in increasing order; but, within each word, the byte with the lowest address appears on the right. Thus, you would read this seven-byte instruction as
0xc745e02c050000. (I know it's confusing. Remember, I didn't design it, I'm just showing you how it works.)
The first byte of this instruction is
0xc7. If you look on page 3-644 of the Developer's Guide,
you will see that
0xc7 is an opcode for
mov. (For some reason, the instruction is
movl in your assembly code; but, you look up
in the Developer's Guide.) Notice the
/0 after the opcode.
If you look on page 3-2 of the Developer's Guide, you will see that the
/0 tells you to ignore the
reg field of the
ModR/M byte (i.e., ignore the x-axis and look only at the y-axis.)
Looking at the y-axis tells us that one of the operands is memory
[rbp] plus some immediate value. In fact, the
destination of this instruction is memory location
%RBP - 0x20;
and, as luck would have it, the next byte,
0xe0 happens to
be the twos complement, hexadecimal representation of -0x20.
Finally, the last four bytes are immediate value being stored.
When read "first-to-last", which is low-to-high, the last four bytes are
However, we conventionally write numbers high-to-low. Thus, when you reverse the order of these bytes you get
0x00052c, which is the immediate value being moved. (Does your head hurt yet?)
x main +5to look at the
subinstruction. When you look on page 4-424 of the Developer's Guide (Volume 2B), you will notice that the first byte,
0x48does not correspond to any of the
subop codes. However, the second byte,
0x83does. In addition, one of the choices for op code is
REX.W + 83. A trip back up to the top of Chapter 2 (specifically, Section 2.2.1 beginning on page 2-9), tells us that the
REXprefixes are used to indicate that the instruction takes at least one 64-bit parameter. In particular, Table 2-4 tells us that the prefix
0x48indicates that the operand size is 64 bits. All
REXprefixes begin with a
4. At this point, we can now note the
/5after the op-code, look up the third byte,
0xecin Table 2-2 to find that one of the parameters is
%rsp. Page 4-424 tells us that the final parameter is an 8-bit immediate value in this case,
Now, it's your turn:
movl" means "
+rwin the opcode.
Your answers should look something like this (note that the page numbers in these examples are incorrect for the current version of the Intel manual):
|assembly instruction||add %edx,%eax|
|Machine instruction (hex)||0x01d0|
|field name||op code||Mod R/M|
|Field meaning||add||source: %edx, destination %eax|
|Info source||Page 3-37||Table 2.2.|
You may find this template helpful.
|assembly instruction||movl $1324,-0x10(%rbp)|
|Machine instruction (hex)||0xc745f00000052c|
|field name||op code||Mod R/M||offset||immediate value|
|Field meaning||mov||destination [RBP] + offset||subtract 4 from RBP||immediate value|
|Info source||Page 3-640||Table 2.2||Table 2.2||page 3-640|
|assembly instruction||sub $0x10, %rsp|
|Machine instruction (hex)||0x4883ec10|
|field name||prefix||op code||Mod R/M|
|Field meaning||64-bit operands||subtract||destination %rbp||immediate value 10|
|Info source||Table 2-4||Page 4-411||Table 2.2||Page 4-411|
pushinstruction is only one byte long. How did the designers squeeze both the opcode and the operator into one byte?
main+16encode that one of the parameters is an immediate value? How is the R/M byte for this instruction used?