CS552 Course Wiki: Fall 2020 : Homework 5 browse

Homework 5

Be sure that you have:

Read the Command-line Verilog Simulation Tutorial. Additional references are in the Tools page.
Read the Verilog Cheatsheet Verilog cheat. Everything you need to know about Verilog is in this document.
Read the Verilog file naming conventions and the Verilog rules, and adhere to these rules.
See Handin Instructions before getting started with this homework.

For each problem, follow these steps:

Break down your design into sub-modules.
Define interfaces between these modules.
Draw paper and pencil schematics for these modules.
Then start writing Verilog.

What to submit:

Problems 1 & 2:
1. Submit all the verilog files. See instructions here.
2. Make sure you run the Verilog rules check on all the files. Not necessary to run it on your testbench.

Problems 3 - 10 are optional and will not be submitted. These are recommended for a better understanding of course material.

Download Files

Required files in this tar file <<<DOWNLOAD THIS FIRST
Problem 1 and 2 are in its own directory, called hw5_1, hw5_2.
Do not edit the provided *_hier.v files.

Problem 1

a) Write an assembly program to demonstrate forwarding in a pipelined processor implementation. Write your code in p1.asm.

b) Also write an explanation of your program including where and why forwarding takes place. Write your answer in p1.txt.

Problem 2

a) Write an assembly program to demonstrate why branch prediction is necessary and useful. Write your code in p2.asm.

b) Write an explanation of your program and how branch prediction helps in p2.txt.

c) Will branch prediction always take only 1 cycle? Include your answer in p2.txt as well.

The remaining problems will not be submitted but are recommended for a better understanding of course material.

Problem 3

Given a 2K Bytes 2 way set associative cache with 16 byte lines and the following code:

for (int i =0; i < 1000; i++)
  {
    A[i] = 40 * B[i];
  }

a) Compute the overall miss rate (assuming array entries require one word, and each word is 4 byte, and that the base address of each array is aligned with cache line boundary).

b) What kind of cache locality is being exploited?

Problem 4

Consider a direct-mapped cache with 32-byte blocks and a total capacity of 512 bytes in a system with a 32-bit address space. Assume this is a byte addressable cache.

Indicate which bits of an address in this machine correspond to the tag, index, and offset, respectively.
For the sequence of addresses below, indicate which references will result in cache hits and which will result in cache misses. If it does result in a miss, mark whether the miss was a compulsory, capacity, or conflict miss. Assume the cache is initially empty. (All valid bits are set to 0)
Show the final contents of the address tags at the end of execution.
Explain what can be done to improve each type of miss.

Problem 5

Re-do problem 3, but using a two-way set-associative cache. When replacing a block, the least-recently-used block is chosen to be replaced. Everything else (block size and total capacity) remains the same.

Determine the speedup over the direct-mapped cache in problem 3. Assume both caches can be accessed in 1 cycle, that the CPI without misses is 1.0, and that the miss penalty is 25 cycles.

Problem 6

Consider a cache with the following characteristics:

32-byte blocks
5-way set associative
1024 sets
47-bit addresses
writeback
LRU replacement policy

How many bytes of data storage are there?
What is the total number of bits needed to implement the cache?
Make a picture similar to the one on page 486 of the text. (As with the picture in the text, include the hit and data logic.)

Problem 7

How many storage bits are required to implement a 256KB cache, with 16B blocks, that is a 4 way set-associative, uses write-back policy, LRU replacement and assuming a 2^36 byte addressable address space ?

Bits are required for : 1. The Data 2. The Tags 3. The Valid bits 4. The dirty bits 5. The LRU bits

Problem 8

Do problems 5.4.1 to 5.4.3 in page 551 of textbook.

Problem 9

Do problems 5.7.1 to 5.7.3 in page 554 of textbook.

Problem 10

Given processor running at 2GHz with a base CPI of 1.0 (CPI without considering memory access delay, stalls, etc). About 30% of the instructions in a program involve data memory access. The access delay of instruction memory is ignored. The data memory access time is 100 ns including miss handling. Its primary (L1) cache has a hit rate of 99% and no access penalty if it is a hit. Now, it is considered to add a L2 cache between the L1 cache and the main memory. Suppose the L2 cache has a miss ratio of 20% and access delay of 5 ns. How much performance improvement with the L2 cache than without it?