| CS552 Course Wiki: Spring 2021 | Main »
Cache Design |
|
|
Project Cache Design On this page... (hide)
What to submit:
1. OverviewYour final processor you design for this course will use both instruction and data caches. For this stage of the project, you will be designing and testing a cache to ultimately be used for your final design. You must first design and verify a direct-mapped cache before proceeding to a two-way set-associative cache. The cache's storage as well as the memory has already been designed for you. You will be implementing the memory system controller to effectively manage the cache. All needed files are included in the original project tar file. The following files are included in the cache directories ( The This semester, we do the project individually and we do not require you to integrate the cache into your processor. You only need to finish two standalone cache designs (
2. Cache Interface and OrganizationThis figure shows the external interface to the base cache module. Each signal is described in the table below.
+-------------------+
| |
enable >------| |
index[7:0] >------| cache |
offset[2:0] >------| |
comp >------| 256 lines |-----> hit
write >------| by 4 words |-----> dirty
tag_in[4:0] >------| |-----> tag_out[4:0]
data_in[15:0] >------| |-----> data_out[15:0]
valid_in >------| |-----> valid
| |
clk >------| |
rst >------| |-----> err
createdump >------| |
+-------------------+
This cache module contains 256 lines. Each line contains one valid bit, one dirty bit, a 5-bit tag, and four 16-bit words:
V D Tag Word 0 Word 1 Word 2 Word 3
___________________________________________________________________________________
|___|___|_______|________________|________________|________________|________________|
|___|___|_______|________________|________________|________________|________________|
|___|___|_______|________________|________________|________________|________________|
|___|___|_______|________________|________________|________________|________________|
Index-------->|___|___|_______|________________|________________|________________|________________|
|___|___|_______|________________|________________|________________|________________|
|___|___|_______|________________|________________|________________|________________|
|___|___|_______|________________|________________|________________|________________|
2.1 Four-banked Memory ModuleFour Banked Memory is a better representation of a modern memory system. It breaks the memory into multiple banks. The four-cycle, four-banked memory is broken into two Verilog modules, the top level
+-------------------+
| |
addr[15:0] >------| four_bank_mem |
data_in[15:0] >------| |
wr >------| 64KB |-----> data_out[15:0]
rd >------| |-----> stall
| |-----> busy[3:0]
clk >------| |-----> err
rst >------| |
createdump >------| |
+-------------------+
Timing:
| | | | | |
| addr | addr etc | read data | | new addr |
| data_in | OK to any | available | | etc. is |
| wr, rd |*diffferent*| | | OK to |
| enable | bank | | | *same* |
| | | | | bank |
<----bank busy; any new request to--->
the *same* bank will stall
This figure shows the external interface to the module. Each signal is described in the table below.
This is a byte-aligned, word-addressable 16-bit wide 64K-byte memory.
On reset, memory loads from file " Format of each file: @0 <hex data 0> <hex data 1> ...etc If input 3. State MachineNow, please take a look at the Notes on the
You will need to determine how your cache is arranged and functions before starting implementation. Draw out the state machine for your cache controller as this will be required. You may implement either a Mealy or Moore machine though a Moore machine is recommended as it will likely be easier. Be forewarned that the resulting state machine will be relatively large so it is the best to start early. An example cache control state machine looks like: ![]() Original source: http://user.engineering.uiowa.edu/~hpca/lecturenotes/verilogcachelinesize2.pdf.
The state machine diagram is due several weeks before the cache demo. 4. Direct-mapped CacheYou will first need to implement a memory system of direct-mapped cache over four-banked main memory. Make your changes for this problem in the 4.1 Signal InteractionsAlthough there are a lot of signals for the cache, its operation is pretty simple. When "enable" is high, the two main control lines are "comp" and "write". Here are the four cases for the behavior of the direct mapped cache:
On a miss, the "valid" output will indicate whether the block occupying that line of the cache is valid. The dirty bit will be read, and will indicate whether or not the block occupying that line is dirty. On the other hand, if "hit" is true while "write" and "comp" are true, "dirty" output is not meaningful and will remain zero (because the dirty bit of the cache was performing a write).
4.2 Perfbench TestingTo begin testing you will use address traces that you will create to target the different possible aspects of cache behavior. The An example address trace file (mem.addr) is provided. The format of the file is the following:
Once you have created your address traces this testbench can be run as such: wsrun.pl -addr mem.addr mem_system_perfbench *.v
If it correctly runs you will get output that looks like the following: # Using trace file mem.addr # LOG: ReQNum 1 Cycle 12 ReqCycle 3 Wr Addr 0x015c Value 0x0018 ValueRef 0x0018 HIT 0 # # LOG: ReqNum 2 Cycle 14 ReqCycle 12 Rd Addr 0x015c Value 0x0018 ValueRef 0x0018 HIT 1 # # LOG: Done all Requests: 2 Replies: 2 Cycles: 14 Hits: 1 # Test status: SUCCESS # Break at mem_system_perfbench.v line 200 # Stopped at mem_system_perfbench.v line 200 If your script stops at the "VSIM >" command line and does not terminate to your shell:
Be aware that just because a SUCCESS message is received it does not guarantee your cache is working correctly. You should use the cache simulator to verify the correct behavior is happening. The cache simulator can be run as follows: cachesim <associativity> <size_bytes> <block_size_bytes> <trace_file> So for this problem you would use: cachesim 1 2048 8 mem.addr This will generate output like the following: Store Miss for Address 348 Load Hit for Address 348 You should then compare this to the perfbench output to make sure they both exhibit the same behavior. The address traces you created should be put in the 'cache_direct/verification' directory and have the '.addr' extension. We are always using 4.3 Randbench TestingOnce you are confident that your design is working you should test it using the random testbench. The random bench does the following:
Remember to get your perfbench tests working before attempting to debug the randbench. You can run the random testbench simply like this: wsrun.pl mem_system_randbench *.v At the end of each section you will see a message showing the performance like the following: LOG: Done two_sets_addr Requests: 4001, Cycles: 79688 Hits: 0 Notice that since we now only have a direct-mapped cache, the This testbench will ultimately print a message saying either: # Test status: SUCCESS or # Test status: FAIL Keep in mind that it's considered a success if the correct data is returned every time but that doesn't mean your cache is necessarily working. If you have no hits or a very small number of them, something is still wrong. If you are seeing failures, try to isolate the case that is causing the issues and create a small trace that generates the same behavior to make debugging easier. 4.4 SynthesisYou will need to run synthesis on your direct mapped cache and verify that it does not produce any errors. You should turn in all of the reports generated by synthesis. Your synthesis results should be placed in the 'cache_direct/synthesis' directory. 5. Two-way Set-associative CacheYou should not start this until you have implemented and fully verified your direct-mapped cache. Remember to change directories to the 5.1 Signal InteractionsAfter you have a working design using a direct-mapped cache, you will add a second cache module to make your design two-way set-associative. Here are the four cases again:
5.2 Replacement PolicyIn order to make the designs more deterministic and easier to grade, all set-associative caches must implement the following pseudo-random replacement algorithm:
Example, using two sets: start with victimway = 0 load 0x1000 victimway=1; install 0x1000 in way 0 because both free load 0x1010 victimway=0; install 0x1010 in way 0 because both free load 0x1000 victimway=1; hit load 0x2010 victimway=0; install 0x2010 in way 1 because it's free load 0x2000 victimway=1; install 0x2000 in way 1 because it's free load 0x3000 victimway=0; install 0x3000 in way 0 (=victimway) load 0x3010 victimway=1; install 0x3010 in way 1 (=victimway) 5.3 Perfbench TestingYour testing for the set-associative cache should be done in much the same way. You can either create more address traces or update your previous ones to reflect the differences in behavior the new design would have. The cache simulator would now be run with slightly different arguments to reflect your changes: cachesim 2 4096 8 mem.addr pseudoRandom If you do not specify the pseudoRandom argument it will use an LRU replacement policy instead of the pseudo-random policy you have implemented. The address traces you used should be put in the 'cache_assoc/verification' directory and have the '.addr' extention. 5.4 Randbench TestingNow, run the randbench on your two-way set-associative cache. You must still pass all the results comparisons and get a SUCCESS message at the end. Whats more, your LOG: Done two_sets_addr Requests: 4001, Cycles: 79688 Hits: 562 5.5 SynthesisYou will also need to synthesize your set-associative cache. You should turn in all of the reports generated by synthesis. Your synthesis results should be placed in the 'cache_assoc/synthesis' directory. 6. Instantiating Cache ModulesIf you are integrating the cache mem_system into your datapath pipeline in When instantiating the module, there is a parameter which is set for each instance. When you dump the contents of the cache to a set of files (e.g. for debugging), this parameter allows each instance to go to a unique set of filenames.
Parameter Value File Names
--------------- ----------
0 Icache_0_data_0, Icache_0_data_1, Icache_0_tags, ...
1 Dcache_0_data_0, Dcache_0_data_1, Dcache_0_tags, ...
2 Icache_1_data_0, Icache_1_data_1, Icache_1_tags, ...
3 Dcache_1_data_0, Dcache_1_data_1, Dcache_1_tags, ...
Here is an example of instantiating two modules with a parameter value of 0 and 1: cache #(0) cache0 (enable, index, ... cache #(1) cache1 (enable, index, ... |
| Page last modified on November 23, 2020, visited times |