CS552 Course Wiki: Fall 2020 : Homework 4 browse

Homework 4

This semester we will not do assignments in partners. Please ignore the downloaded partners.txt file.

Be sure that you have:

Read the Command-line Verilog Simulation Tutorial. Additional references are in the Tools page.
Read the Verilog Cheatsheet Verilog cheat. Everything you need to know about Verilog is in this document.
Read the Verilog file naming conventions and the Verilog rules, and adhere to these rules.
See Handin Instructions before getting started with this homework.

For each problem, follow these steps:

Break down your design into sub-modules.
Define interfaces between these modules.
Draw paper and pencil schematics for these modules.
Then start writing Verilog.

What to submit:

Problem 1 - 3:
1. Submit all the verilog files. See instructions here.
2. Make sure you run the Verilog rules check on all the files. Not necessary to run it on your testbench.

Problems 4 - 11 are optional and will not be submitted. These are recommended for a better understanding of course material.

Download Files

Required files in this tar file <<<DOWNLOAD THIS FIRST
Problem 1, 2, and 3 are in its own directory, called hw4_1, hw4_2, hw4_3.
Do not edit the provided *_hier.v files.

Problem 1

In this problem, you must design a FIFO called fifo in fifo.v. It holds 64-bit data values and has a capacity of 4. The FIFO should implement the functionality of a conventional first-in-first-out data structure. You should use the dff module provided.

Ports:

The FIFO accepts new input each cycle if the data_in_valid is asserted, unless it is full (which should be indicated on fifo_full). Data inserted into a full FIFO should be ignored.
The data_out signal is always driven by the data currently at the head of the FIFO (the oldest data). The head of the FIFO is popped when pop_fifo is asserted.
An empty FIFO drives zeros on data_out and asserts fifo_empty. Popping an empty FIFO has no effect.
Asserting reset (high) clears the FIFO to empty.
All outputs should change only in response to the rising clock edge.
The err output is a standard way of indicating hardware errors or illegal states. Assert it (1'b1) for states which are supposed to be impossible to get into.

Notes:

If the FIFO is full, fifo_full should be held high (irrespective of whether pop_fifo is asserted or not).
If the FIFO is empty, fifo_empty should be held high (irrespective of whether data_in_valid is asserted or not).
Handling the case when both data_in_valid and pop_fifo are asserted:
1. CASE I: The FIFO is neither empty nor full - Both the requests should be serviced, i.e., the appropriate elements should be inserted/popped to/from the FIFO.
2. CASE II: The FIFO is empty - Accept the incoming element and write it into the FIFO. Do not pop the element in the same cycle (continue to assert fifo_empty in that cycle). You should not bypass the element to FIFO output directly.
3. CASE III: The FIFO is full - Ignore the data being inserted, pop the correct element from the FIFO and drive data_out. Again, do not insert anything into the FIFO in the same cycle.

Hint: Consider building your FIFO as a circular list. Besides data registers, also have two registers remembering the current list head & list tail index. You may or may not need an extra register (e.g., for storing the current number of elements in list) depending on your design.

The testbench for this problem (fifo_bench.v) consists of 145 randomly generated test cases. Each test case asserts a set of input signals to your module, and after one cycle, compares outputs from your module with the outputs that are expected from a perfect FIFO implementation.

If there are no errors in your design you will see a "TEST PASSED" message. If the testbench failed with a "TEST FAILED" message, there are 3 possible reasons:

Error in fifo_empty signal: You will see an error message similar to "MINORCHECK : In cycle xx - EMPTY logic : empty = xx, expected empty = xx" in the testbench output.
Error in fifo_full signal: You will see an error message similar to "MINORCHECK : In cycle xx - FULL logic : full = xx, expected full = xx" in the testbench output.
Error in data_out signal: You will see an error message similar to "ERRORCHECK : In cycle xx - Data out error. data_out = xx, expcted data_out = xx" in the testbench output.

Above each of these error messages, you will see the inputs to your module, your outputs, and the expected outputs for that cycle which can help you debug. If you have only "MINORCHECK" errors in your submission, you will get a maximum of 85% for this problem.

Problem 2

Read the synthesis tutorial on the Synthesis page. Synthesize your FIFO from problem 1.

Synthesis will create a synth directory which will include fifo.syn.v, area report, timing report, etc.
Make sure that:
- NO cells in the synth/cell_report.txt has zero area in it. If you see a zero area for a module name, then that means that module had some kind of Verilog coding error and did NOT get synthesized. You must fix this problem. Look in synth.log, search for that module name, and look for warnings/errors.
- NO cells in the synth/reference_report.txt has zero total area. If it does, then some sub modules underneath have failed to synthesize. Check cell_report.txt.

Move that synth folder from hw4_1/ to hw4_2/.

Problem 3

Develop instruction-level tests for your processor. In this problem, each one of you will develop a set of small programs that are meant to test whether your processor implements an instruction correctly. You will write these programs in assembly and run them on an instruction emulator to make sure what you wrote is indeed testing the right thing. The eventual goal is to run these programs on your processor's Verilog implementation and use them to test your hardware design.

Info about how to write assembly code and also about how to use the assembler can be found in the Using the assembler page. Details about what each instruction means are available in the ISA specification page.

Each of you will be responsible for one randomly assigned instruction (along with common instructions jal, jalr) and must develop a set of simple programs for those instructions. The table below gives the assignment of instructions.

Last name (family name) starts with	Instruction
Ag / Al	`j`
A* (2nd char not 'g' or 'l')	`add`
Ba	`sle`
Bo	`st`
B* (2nd char not 'a' or 'o')	`slt`
Che	`addi`
Ch* (3rd char not 'e')	`srli`
C* (2nd char not 'h')	`jr`
D	`andni`
E	`xor`
F	`slbi`
G	`sub`
He	`ld`
H* (2nd char not 'e')	`stu`
I	`sll`
J	`seq`
K	`btr`
Liu	`rol`
Li* (3rd char not 'u')	`beqz`
L* (2nd char not 'i')	`slli`
Na / Ne	`lbi`
N* (2nd char not 'a' or 'e')	`andn`
P	`subi`
R	`rori`
Si	`srl`
Sa / Sc	`ror`
S* (2nd char not 'i', 'a' or 'c')	`roli`
T	`sco`
V	`bgez`
W	`bnez`
X	`bltz`
Z	`xori`
Not listed	`add`

This table is assigned to give a decent coverage over all instructions in the WISC-SP13 ISA.

To get you started, below are two example tests for the add instruction:

// add_0.asm

lbi r1, 255
lbi r2, 255
add r3, r1, r2
halt

// add_1.asm

lbi r1, 255
lbi r2, 0
add r3, r1, r2
halt

You will notice one thing. The add test uses the lbi instruction also! Your goal while writing these tests is to isolate your instruction as much as possible and minimize the use of the other instructions.

The workflow we will follow is:

Write these programs in WISC-SP13 assembly language (a simplified version of MIPS; see the ISA specification page).
Assemble using the assembler assemble.sh. It will produce .img memory image which represents the assembled machine code of your program. See Using the Assembler page.
Simulate the test with the wisccalculator simulator and make sure your test is doing what you thought it was doing. See WISC-SP13 Simulator-debugger.

prompt> assemble.sh add_0.asm
Created the following files
loadfile_0.img  loadfile_1.img  loadfile_2.img  loadfile_3.img  loadfile_all.img  loadfile.lst

prompt> wiscalculator loadfile_all.img

WISCalculator v1.0
Author Derek Hower (drh5@cs.wisc.edu)
Type "help" for more information

Loading program...
Executing...
lbi r1, -1
INUM:        0 PC: 0x0000 REG: 1 VALUE: 0xffff
lbi r2, -1
INUM:        1 PC: 0x0002 REG: 2 VALUE: 0xffff
add r3, r1, r2
INUM:        2 PC: 0x0004 REG: 3 VALUE: 0xfffe
halt
program halted
INUM:        3 PC: 0x0006
Program Finished

The simulator will print a trace of each instruction along with the state of the relevant registers. You should examine these to make sure that your test is indeed doing what is expected.

If you see dynamic library lib*.so.x load errors on a CSL machine when running wiscalculator, see the WISC-SP13 Simulator-debugger page to install those missing libs manually.

What to deliver:

Write a set of tests for your instruction. Name them <instruction>_[0,1,2,3,...].asm.
- Use your discretion to decide how many tests you need.
- Identify corner cases. Think about possible bugs in the hardware.
- Tests should be short and target specific cases, NOT all cases at once.
- Limit the number of other instructions used besides what you're testing. If the test fails it should ideally be from the instruction you're testing and not another that was used.
In addition to your assigned instruction, everyone must write tests for the jal and jalr instruction.
Write comments in your assembly code explain what the test is doing (comments use "//" for our assembler).
Make sure that all of your assembly files will assemble. You will get zero grade for this problem for malformed assembly.

Make sure the hw4_3 folder in your final submission contains those .asm assembly programs and nothing else.

The remaining problems will not be graded but are recommended for a better understanding of course material.

Problem 4

Indicate all of the true, anti-, and output-dependences in the following segment of MIPS assembly code:

    xor    $1, $2, $3
    and    $4, $5, $6
    sub    $7, $4, $5
    add    $5, $1, $5
    sw     $4, 100($7)
    or     $4, $7, $4

For the code above, which of the dependences will manifest themselves as hazards in the pipeline in Figure 4.41 on page 355 of COD4e? How are these hazards resolved in this pipeline? Assuming the 'xor' instruction enters fetch (F) in cycle 1, in what cycle does the 'or' instruction enter writeback (W)? Show your work in a pipeline diagram. (Assume that the register file cannot read and write the same register in the same cycle and get the new data.)

How does your answer change if you consider the pipeline in 4.60, on page 375 of COD4e? (Assume that the register file contains internal bypassing and can read and write the same register in the same cycle and get the new data.)

Problem 5

Consider the following assembly program to be executed in a MIPS ISA 5-stage(F,D,X,M,W) pipelined data path given in figure 4.51 on page 362 of COD4e:

    I1: add $3,$4,$6
    I2: sub $5,$3,$2
    I3: lw $6,100($5)
    I4: add $5,$6,$3

a) Identify every occurrence and every types of data dependencies True(RAW), Anti(WAR), Output(WAW) in the above problem. Also, indicate which register is involved in that data dependency.

b) If this program is to be executed in a pipelined data path, create a pipeline timing diagram table(clock cycle numbers as column and instructions as rows)assuming NO forwarding, except that register forwarding is available.

c) Identify all the data hazards that may occur as applicable. For each hazard, indicate whether data forwarding(including register forwarding) may be applied to eliminate that hazard. For each hazard, give the two instructions involved, the register involved, and the pipeline register(IF/ID, ID/EX, EX/MEM, MEM/WB)whose output will be used for data forwarding.

Problem 6

Consider the following program code:

    lw  $s1, 8($s0)
    sub $s0,$s1,$S2 
    add $s0,$s0,$s1

If the above program is to be executed in a pipelined datapath given in figure 4.51 on page 362 of COD4e equipped with full data forwarding (as well as register forwarding), complete the timing diagram table(clock cycle numbers as column and instructions as rows). Also mark the clock cycle when a data forwarding(F) takes place or a pipeline stall(S) is inserted.

Problem 7

Consider the following code sequence and the datapath in figure 4.51 on page 362 of COD4e. Assuming the first instruction is fetched in cycle 1 and the branch is not taken, in which cycle does the 'and' instruction write its value to the register file? What if the branch IS taken? (Assume no branch prediction). Show pipeline diagrams.


            beq    $2, $3, foo
            add    $3, $4, $5
            sub    $5, $6, $7
            or     $7, $8, $9
    foo:    and    $5, $6, $7

Problem 8

Consider the pipeline in Figure 4.51 on page 362; assume predict-not-taken for branches and assume a "Hazard detection unit" in the ID stage as shown on page 379. Can an attempt to flush and an attempt to stall occur simultaneously? If so, do they result in conflicting actions and/or cooperating actions? If there are any cooperating actions, how do they work together? If there are any conflicting actions, which should take priority? What would you do in the design to make sure this works correctly? You may want to consider the following code sequence to help you answer this question:


        beq $1, $2, TARGET  #assume that the branch is taken
        lw  $3, 40($4)
        add $2, $3, $4
        sw  $2, 40($4)
TARGET: or  $1, $1, $2

Problem 9

Consider the following MIPS assemble code segment:


         bne $s1,$s2,LABEL  // $s1 != $s2
         add $t2,$t1,$s1
         sw $t2,4($s1)
         j EXIT
  LABEL: lw $s1,4($s6)
  EXIT:  addi $s1,$s1,4

Assume this code segment on a pipelined data path with data forwarding depicted in figure 4.65 on page 384 of COD4e where the branch decision is made in ID stage.

Assuming $s1 != $s2, a control hazard will occur. Provide a timing diagram table (clock cycle numbers as column and instructions as rows), to show which instructions are running at which phase (F,D,X,M,W)at each clock cycle. If an instruction is flushed from the pipeline, then the remaining phases should not appear. If an instruction is stalled for one cycle, then the remaining phases will be pushed back by one cycle. Indicate on the clock cycle and corresponding instruction for any flush or stall action. (No branch predictors are used in this problem).

Problem 10

During the execution of a program, conditional branches have been executed 15 times. The traces of TAKEN(T) and NOT-TAKEN(N) of each branch instruction are listed below:

T-T-N-T-T-T-N-T-N-T-T-N-N-N-T

a) Prediction accuracy for "always NOT TAKEN" =

b) Prediction accuracy for "1 - bit predictor" =

   Indicate output of predictor for each instruction traced. Outcome = 1 if correct, and 0 if incorrect.

c) Prediction accuracy for "2 - bit predictor" =

   Indicate output of predictor for each instruction traced. Outcome = 1 if correct, and 0 if incorrect.

Note: For dynamic predictors (1 bit and 2 bit), assume the first predicted entry as TAKEN (T) and then proceed.

Problem 11

High-performance datapaths use bypass paths (also known as data forwarding logic) to reduce pipeline stalls. However, bypass paths are relatively expensive, especially in some wire constrained technologies. To reduce the cost (and potential cycle time impact), some architects have explored omitting some of the possible bypass paths. Consider the datapath illustrated below (note that the PC update logic and all control logic is intentionally omitted). This pipelined datapath is similar to the one in the book, but only has bypass paths on one side of the ALU. Assume that the register file intentionally bypasses the value, so that if register Si is read and written in the same cycle, then the read returns the new value. Assume that the control logic bypasses the data as soon as possible using the given forwarding data paths, and stalls in decode otherwise. You may NOT add additional data paths.

In this problem, you will look at how a program snippet performs on this pipeline. Recall that R-format instructions have the form: opcode rd, rs, rt

and I-format instructions have the form: opcode rt, imm(rs) or opcode rt, rs, imm

Use the table given below to show how the given instruction sequence flows through the pipeline and where stalls are necessary to resolve hazards.

Consider the code and pipeline above. Show the execution of this code on the pipeline above. Use the letters, F, D, X, M, and W.

For each cycle where a stall occurs, explain why.