Author: Zack Kurmas, with some very minor modifications by Andrew Kalafut
Objective: The purpose of this lab is to explore the different behaviors of different CPUs that implement the Intel instruction set. To put it another way, we want to observe an AMD processor behaving differently from an Intel processor --- especially with respect to the implementation of their superpipelines.
Deliverables: Submit in hardcopy a lab report that includes answers to the numbered questions.
This lab should preferably be done in groups of 2. If there are sufficient lab computers available, you may work alone.
Simplescalar is a suite of
programs that simulate the execution of programs compiled using a
MIPS-like instruction set called PISA. You can simulate the execution
of any program using Simplescalar by simply re-compiling the program
using a version of
gcc that knows how to generate PISA
instructions as well as x86 instructions.
sim-cache. This tool takes as input a description of a machine's memory hierarchy (i.e., cache levels) and reports on the number of hits and misses in each cache. Section 4.2 of The Simplescalar Tech Report explains how to describe the cache setup you want to simulate. When you read through this section, take note of three things:
sim-cache, you don't specify the size of the cache directly. Instead you specify (1) the number of lines, (2) the block size, and (3) the associativity of the cache. The size of the cache is the product of these three numbers. Thus, a 4-way set associative cache with 1024 lines of 16 bytes each is 4*1024*16 = 262144 bytes (256 kilobytes).
dl1" will appear twice when configuring the L1 data cache.)
sim-cachefrequently uses both "
1" (the numeral "one") and "
l" (a lower-case letter "L"). Watch carefully because the differences in print between the two can be subtle.
For example, to configure a machine with an 8KB, direct-mapped L1 data
cache with 32 byte blocks, use this command:
dl1:256:32:1:l. Notice that 256 blocks times 32 bytes per
block equals 8192 bytes.
sim-cache, I recommend sending the output
directly to a file using the command line
will contain the results of the simulation (i.e., cache hit and miss
file2 will contain the output produced by
the program simulated. This data is generally not interesting.
Your first task is to examine the effects of block size on a "toy"
blockSize1.c. This small
program iterates through each byte in a large array. Begin by
compiling this C program for simplescalar using the following
~kurmasz/public/CS451/binis in your path, you can use
~kurmasz/public/Simplescalar/bin/ss_gcc.) Make sure you are running the Simplescalar version. If you can run
./a.outfrom the command line, you used the wrong version.
Running this command will produce a file named
(As with the normal version of
gcc, you can specify the
name of the executable generated using the
This file will not run by itself. It will run only as input to one of
the Simplescalar programs. If it does, you generated it using the wrong version of
sim-cacheto determine the miss-rates of an 8KB, direct-mapped cache with the following block sizes: 8 bytes, 16 bytes, 32 bytes, and 64 bytes. To do so, use commands that look like this:
dl1:line:block:1:l -redir:prog /dev/null -redir:sim
Where block ranges from 8 to 64, and line is set such that product of block times line is 8192.Hints for running sim-cache:
1followed by the letter
l(as in "lru").
sim-cachewith varying block sizes and present the results. The line below gives an example of how to perform arithmetic in a bash script:
After you have run
sim-cache for each block size,
grep each output file (
etc.) for the line "
dl1.miss_rate". List the miss rate
for each block size tested.
arrayis an array of characters; therefore, each item in the cache is exactly 1 byte. As a result, it is easy to identify data items that will or will not conflict in the cache. For example, in an 8KB direct-mapped cache, array bytes 0 and 8192 will conflict. Your job is to find sets of array elements that conflict with a 16 byte block, but not an 8 byte block.
gccis a C compiler. Your code must be straight C. No iostreams; no "//"-style comments; and, all variables must be declared at the beginning of each function.
arraytends to be mapped to cache slot 0. Sometimes, it gets mapped somewhere else. For Fall, 2009
arraymapped to the middle of a 16-byte block. To account for this, you can add the following line of code to
register char* arraym = array + 8;
qsortgiven a 1KB, 4KB, and 16KB cache. Present your results using a graph with block size on the x-axis and the miss rate on the y-axis. Please generate one graph with three lines: One each for 1KB, 4KB, and 16KB. Valid block sizes are powers of 2 from 8 to 64. Your graph should have a form similar to Figure 8.18 from the textbook.
input_1e4for input. (It contains 50,000 randomly generated integers.)
qsortexecutable and sample inputs are found in
~kurmasz/public/Simplescalar/Tests/qsort/input_1e4into your current directory.
~kurmasz/public/Simplescalar/bin/sim-cache -cache:dl1 dl1:64:16:1:l -redir:prog opt -redir:sim output_dl1:64:16:1:l ~kurmasz/public/Simplescalar/Tests/qsort/ss_qsort input_1e4
optmay give you a clue. If not, ask the instructor for help.
qsort(or another program of your choice) and a cache size. Produce a graph showing miss rates as associativity ranges over 1, 2, 4, 8, 16, and fully associative. Your graph should have associativity on the x-axis, and miss-rate on the y-axis. It should also contain four lines: one for each block size. Be sure to clearly label your graph with the cache size. Your graph should have a form similar to Figure 5.30 in Patterson and Hennessey (4th edition, revised).