CIS 451 Lab 7: Intel Machine Language

Author: Greg Wolffe and Zack Kurmas, with some modifications by Andrew Kalafut

Objective: The purpose of this lab is to explore an instruction set and machine language that is different from MIPS. In particular, we will be exploring the Intel 64/IA-32 machine language. As you work through this lab, pay attention to the differences between MIPS and IA-32 including instruction length, addressing modes, and the variety of instructions included in the instruction sets.

Deliverables: Submit in hardcopy a lab report that includes answers to the numbered questions.

This lab should preferably be done in groups of 2. If there are sufficient lab computers available, you may work alone.

Generating and Understanding Assembler Code

The easiest way to write correct assembly code is to let a compiler do it for us! Then we just have to figure out what it's doing and why. A quick look at the man pages for the gcc compiler under Linux shows that the '-S' switch directs the compiler to generate assembly and stop (without producing an executable). We can use this feature to learn the nature of a particular instruction set by writing simple, understandable programs in a high-level language (like C) and studying what the compiler produces.

A brief note about assembler notation in the interest of making the programs easier to read:

As an example, begin with the program exampleIML-1b.c. Run gcc -S exampleIML-1b.c to produce assembly code for the native machine. (I compiled this code on a arch01, a 64-bit AMD machine. If you use a different machine, your code may look different --- especially if the machine is a 32-bit machine. Assembly code from 32-bit machines won't have any instructions that end with "q".)

Look at the resulting code.

  1. Explain what each of the assembly language instructions in exampleIML-1b.s does and why. (A couple of the "whys" aren't obvious, so don't hesitate to ask for help.) Note that the syntax used here for arithmatic is the opposite of that used in class. The second register is both a source and a destination.

Intel Machine Language

In comparison to MIPS, the Intel machine language is extremely complex: Instructions can be anywhere from 1 to 15 bytes long; and, each instruction supports many different addressing modes. Fortunately, most of the extremely complex instructions are very specialized and rarely used (e.g., the MMX instructions). The instructions you will examine today are much simpler.

Figure 2-1 on page 2-1 of Volume 2A of the Software Developer's Manual shows the basic format of an Intel instruction. Some instructions have a prefix of up to four bytes. The next 1 to 3 bytes contain the op code. The instructions we will examine all have one byte opcodes. The byte after the opcode describes the operands and the addressing modes of the operands. As shown in Figure 2-1, this byte is divided into three fields:

Look at Table 2-2 on page 2-5. The y-axis lists the possible values for an instruction's "memory" operand (the operand that may access memory). The brackets signify a memory access. For example [EAX] means that the operand is the data in the memory location whose address is stored in the register eax . In contrast, EAX identifies a simple, register-direct access to eax . The x-axis lists the registers that can serve as the second operand. Thus, according to this table, an instruction whose first operand is [EAX] , and whose second operand is EDX would have a ModR/M byte of 0x10. Notice that some R/M bits (0x04 , 0x0C , 0x84 , 0x8C, etc.) indicate that the operands are listed in a second addressing bit called the SIB . Table 2-3 lists the meaning of values in the SIB.

  1. Using Table 2-2, identify the addressing mode that corresponds (in general) to each of the four possible values of Mod.

Now, let's look at some real Intel machine code:

Now, it's your turn:

  1. List the machine instruction for each of the instructions marked with a number, and identify the meaning of each byte. Clearly indicate how the source(s) and destination are specified. Some hints and sample output appear below.

    Your answers should look something like this (note that the page numbers in these examples are incorrect for the current version of the Intel manual):

  2. Notice that the push instruction is only one byte long. How did the designers squeeze both the opcode and the operator into one byte?
  3. When using Table 2-2, sometimes the y-axis refers to the source operand, and sometimes it refers to the destination. How can you tell whether the y-axis refers to the first or second operator? Hint: Compare instructions main+40 and main+49.
  4. How/where does instruction main+16 encode that one of the parameters is an immediate value? How is the R/M byte for this instruction used?