CS552 Course Wiki: Spring 2021 | Synthesis »
Synthesis |
|
Synthesizing Your Design On this page... (hide)
1. OverviewIn this class we will do some very simple synthesis of your designs. The primary goal of this exercise is to get a sense for the actual hardware your Verilog is creating. Synthesizing your design will allow us to see:
There is a lot more to synthesis optimizations than what we will cover in this class. We will be using Synopsys DC Compiler and a 45nm gate library provided by FreePDK. A lot of the details will be abstracted away and you will be using a simple script called To synthesize your design several pieces of information are required:
You will *IGNORE* the constraints in the red box above for this class and instead use the defaults we provide. All your designs will be synthesized to meet a 1 Ghz clock frequency (1ns clock-period). Area goal is to minimum area. We will perform synthesis in the following three steps:
Before we can begin, we should setup environment variables and such just like what we did for ModelSim. 2. Environment SetupAdd the following lines to your source /s/synopsys-2020_06_08/bin/synopsys_env.sh Download the file .synopsys_dc.setup and copy it into your home directory. The file is called .synopsys_dc.setup Note a dot is the first character in the filename indicating this is a UNIX hidden file. Many browsers may sometimes delete this dot. So be careful. Your file copied in your home directory MUST have the dot as the first character.
IMPORTANT: Log out of the shell, then log back in, or explicitly 3. The
|
Step 1 | Setup environment. |
Step 2 | Go to the correct directory where you have all the verilog files.
prompt> cd hw2/hw2_2 |
Step 3 | Make a list of verilog files that are part of the design. Create a text file called list.txt in hw2/hw2_2 . Its content: one verilog filename per line.
An example 16_4mux.v 16_8mux.v 16CLA.v 24_12mux.v 28_14mux.v 32_16mux.v 32_8mux.v 4_1mux.v 4CLA.v 8_2mux.v alu.v bshifter.v bshift.v where |
Step 4 |
a) First "check" the design is synthesizable: prompt> synth.pl --list=list.txt --type=other --cmd=check --top=alu b) Look at the output on the screen or synth.log.
c) Checking done. |
Step 5 | Synthesis.
prompt> synth.pl --list=list.txt --type=other --cmd=synth --top=alu Wait for some time, it may take a few minutes..... Go look in the |
Step 6 | Checking synthesis output.
Make sure that:
|
We will do this slightly differently. For all other problems, we allowed the synthesis tool to completely "flatten" the design. If we flatten the full processor, then reasoning about it and applying optimizations will be hard. So we will break it up into some large pieces and preserve the hierarchy at those levels. Specifically, we will preserve fetch, decode, execute, memory, and writeback modules of your processors. Within each of those modules, we will let synthesis completely flatten the design. This is why, when you specify the --type=proc
option, you must specify the fetch, decode, execute, memory, and writeback module names.
Step 1 | Setup environment. |
Step 2 | Go to the correct directory where you have all the Verilog files.
prompt> cd demo1 |
Step 3 | Create a list file list.txt with one Verilog filename per line.
For the project, the Hence, in synthesis, we list The following is an example alu.v claadder16.v claadder4.v compless16.v compless4.v ctrl.v decode.v dff.v exec.v extender16.v fetch.v fulladder1.v memory.v memory2c.syn.v proc.v regfile.v register.v rf_bypass.v rotaterl16.v rotaterr16.v shifterl16.v shifterr16.v writeback.v |
Step 4 |
a) First "check" the design: prompt> synth.pl --list=list.txt --type=proc --cmd=check --top=proc --f=fetch --d=decode --e=exec --m=memory --wb=writeback b) Look at the output on the screen or synth.log.
c) Checking done. |
Step 5 | Synthesis.
prompt> synth.pl --list=list.txt --type=proc --cmd=synth --top=proc --f=fetch --d=decode --e=exec --m=memory --wb=writeback Wait for some time, it may take a few minutes..... Go look in the |
Step 6 | Checking synthesis output.
Make sure that:
|
Just like in the processor demo, some basic memory modules we provided for the cache demo are not synthesizable because they contain simulation logic for testing your cache design. You may have noticed that we provided the following dummy module files:
final_memory.syn.v
memc.syn.v
memv.syn.v
For these three modules, list the above filename in list.txt
instead of its original .v
.
Also, do not include:
mem_system_ref.v
, as it is a reference model only invoked by the testbench.
The rest of the steps to synthesize our standalone cache is just the same as synthesizing a normal homework assignment module. The top-level module of your standalone cache system should be mem_system
.
synth/
)hierarchy.txt
This file describes your design hierarchy in text-format. It shows the list of top-level modules. For each module it shows list of sub-modules. And for each sub-module, the sub-sub-module, and so on. An example is shown below:
alu GTECH_AND2 gtech GTECH_NOT gtech GTECH_OR2 gtech GTECH_XOR2 gtech barrelshifter bit1_shifter mux4_1 mux2_1 GTECH_BUF gtech GTECH_NOT gtech bit2_shifter mux4_1 ... bit4_shifter mux4_1 ... bit8_shifter mux4_1
synth.log
This is the log of all synthesis commands. Specifically look in this file for warnings and errors if your design does not synthesize.
area_report.txt
This file includes a report on the area occupied by your design. The file is mostly self-explanatory. The cell area is expressed in square microns. An example file is shown below:
Library(s) Used: gscl45nm (File: /scratch/users/karu/courses/cs755/tools/Synopsys_Libraries/libs/gscl45nm.db) Number of ports: 3 Number of nets: 660 Number of cells: 15 Number of references: 12 Combinational area: 17600.626691 Noncombinational area: 2433.320446 Net Interconnect area: undefined (No wire load specified) Total cell area: 20033.947137 Total area: undefined
Whatever you see on the line: "Total cell area:" is the actual cell area.
timing_report.txt
This file will contain the list of the top-20 longest/slowest paths in your design. For each such path you will see the start and a list of gates that make up the path. Recall that, all your designs will be synthesized to meet a 1 Ghz clock frequency (1ns clock-period). For example:
Startpoint: dx_reg/dff0[106]/dff0/state_reg (rising edge-triggered flip-flop clocked by clk) Endpoint: xm_reg/dff0[62]/dff0/state_reg (rising edge-triggered flip-flop clocked by clk) Path Group: clk Path Type: max Point Incr Path -------------------------------------------------------------------------- clock clk (rise edge) 0.00 0.00 clock network delay (ideal) 0.00 0.00 dx_reg/dff0[106]/dff0/state_reg/CLK (DFFPOSX1) 0.00 # 0.00 r dx_reg/dff0[106]/dff0/state_reg/Q (DFFPOSX1) 0.13 0.13 f dx_reg/dff0[106]/dff0/q (dff_264) 0.00 0.13 f dx_reg/dff0[106]/q (dff_en_264) 0.00 0.13 f dx_reg/Out<106> (register_N_N114) 0.00 0.13 f ex_stage/reg_rs_dx<2> (Execute) 0.00 0.13 f ex_stage/U225/Y (INVX1) 0.02 0.15 r ex_stage/U224/Y (NAND2X1) 0.01 0.16 f ex_stage/U227/Y (AND2X2) 0.04 0.20 f ex_stage/U228/Y (INVX1) 0.00 0.19 r ex_stage/U231/Y (AND2X2) 0.03 0.22 r ex_stage/U242/Y (INVX1) 0.02 0.24 f ex_stage/U232/Y (NOR2X1) 0.02 0.26 r ex_stage/forward/C47/Z_0 (*SELECT_OP_4.1_4.1_1) 0.00 0.26 r ex_stage/U223/Y (OR2X1) 0.03 0.29 r ex_stage/forward_a_mux/mux0[0]/mux2/C11/Z_0 (*SELECT_OP_2.1_2.1_1) 0.00 0.29 r ex_stage/U538/Y (INVX1) 0.01 0.30 f ex_stage/U537/Y (NAND2X1) 0.01 0.31 r ex_stage/U448/Y (AND2X2) 0.04 0.35 r ex_stage/U378/Y (XOR2X1) 0.03 0.38 f ex_stage/U318/Y (INVX1) 0.00 0.39 r ex_stage/U434/Y (AND2X2) 0.03 0.42 r ex_stage/U435/Y (INVX1) 0.02 0.43 f ex_stage/U584/Y (OAI21X1) 0.05 0.49 r ex_stage/U601/Y (OAI21X1) 0.03 0.51 f ex_stage/U390/Y (AND2X2) 0.03 0.55 f ex_stage/U442/Y (INVX1) 0.00 0.54 r ex_stage/U417/Y (AND2X2) 0.03 0.57 r ex_stage/U418/Y (INVX1) 0.01 0.59 f ex_stage/U276/Y (AND2X2) 0.04 0.63 f ex_stage/U338/Y (XOR2X1) 0.02 0.65 r ex_stage/alu/mux1/mux0[5]/mux0/C11/Z_0 (*SELECT_OP_2.1_2.1_1) 0.00 0.65 r ex_stage/alu/mux1/mux0[5]/mux2/C11/Z_0 (*SELECT_OP_2.1_2.1_1) 0.00 0.65 r ex_stage/alu/mux0/mux0[5]/C11/Z_0 (*SELECT_OP_2.1_2.1_1) 0.00 0.65 r ex_stage/alu/mux10/mux0[5]/C11/Z_0 (*SELECT_OP_2.1_2.1_1) 0.00 0.65 r ex_stage/U252/Y (OR2X2) 0.03 0.69 r ex_stage/U251/Y (INVX1) 0.01 0.70 f ex_stage/U250/Y (AND2X2) 0.03 0.74 f ex_stage/U247/Y (AND2X2) 0.03 0.77 f ex_stage/U246/Y (AND2X2) 0.03 0.80 f ex_stage/U10/Y (AND2X2) 0.03 0.83 f ex_stage/U244/Y (AND2X2) 0.03 0.87 f ex_stage/U257/Y (AND2X2) 0.03 0.90 f ex_stage/U258/Y (AND2X2) 0.03 0.93 f ex_stage/U261/Y (AND2X2) 0.04 0.97 f ex_stage/U265/Y (INVX1) 0.00 0.96 r ex_stage/U262/Y (NAND2X1) 0.01 0.97 f ex_stage/alu/mux7/C11/Z_0 (*SELECT_OP_2.1_2.1_1) 0.00 0.97 f ex_stage/alu/mux6/C11/Z_0 (*SELECT_OP_2.1_2.1_1) 0.00 0.97 f ex_stage/alu/mux5/mux0[0]/C11/Z_0 (*SELECT_OP_2.1_2.1_1) 0.00 0.97 f ex_stage/alu/mux4/mux0[0]/C11/Z_0 (*SELECT_OP_2.1_2.1_1) 0.00 0.97 f ex_stage/alu/mux3/mux0[0]/C11/Z_0 (*SELECT_OP_2.1_2.1_1) 0.00 0.97 f ex_stage/ALU_out<0> (Execute) 0.00 0.97 f xm_reg/In<62> (register_N_N92) 0.00 0.97 f xm_reg/dff0[62]/d (dff_en_128) 0.00 0.97 f xm_reg/dff0[62]/U3/Y (INVX1) 0.00 0.97 r xm_reg/dff0[62]/U2/Y (MUX2X1) 0.02 0.99 f xm_reg/dff0[62]/dff0/d (dff_128) 0.00 0.99 f xm_reg/dff0[62]/dff0/U3/Y (AND2X1) 0.03 1.02 f xm_reg/dff0[62]/dff0/state_reg/D (DFFPOSX1) 0.00 1.02 f data arrival time 1.02 clock clk (rise edge) 1.00 1.00 clock network delay (ideal) 0.00 1.00 xm_reg/dff0[62]/dff0/state_reg/CLK (DFFPOSX1) 0.00 1.00 r library setup time -0.06 0.94 data required time 0.94 -------------------------------------------------------------------------- data required time 0.94 data arrival time -1.02 -------------------------------------------------------------------------- slack (VIOLATED) -0.08
In the above example, there are about 40 or 50 gates on that path. Right at the end notice the string slack (VIOLATED). This means the design is consuming 0.08ns longer than it should. You should try optimizing. The names of gates and their prefix give you a hint on which stage of the pipeline this logic belongs to.
reference_report.txt
This file will show you all the low-level modules that ended up in your design. It will show you how many times each such cell was instantiated. For example:
Reference Library Unit Area Count Total Area Attributes ----------------------------------------------------------------------------- AND2X1 gscl45nm 2.346500 5 11.732500 AND2X2 gscl45nm 2.815800 15 42.236999 BUFX2 gscl45nm 2.346500 15 35.197499 BUFX4 gscl45nm 2.815800 3 8.447400 INVX1 gscl45nm 1.407900 22 30.973799 INVX2 gscl45nm 1.877200 8 15.017600 INVX4 gscl45nm 3.285100 1 3.285100 INVX8 gscl45nm 3.285100 4 13.140400 NOR2X1 gscl45nm 2.346500 1 2.346500 OAI21X1 gscl45nm 2.815800 1 2.815800 OR2X1 gscl45nm 2.346500 1 2.346500 OR2X2 gscl45nm 2.815800 28 78.842399
cell_report.txt
This file will provide the individual areas of every module synthesized. If you see any module with a zero in this file, it means that module was NOT synthesized correctly. The format of this file is similar to the references_report.txt file.
.syn.v
fileThis file contains the synthesized structural netlist of your design.
Thus far we have been synthesizing your design preserving its hierarchy. That is, if you said build a barrel-shifft using mux -> shift -> mux -> shift. Then synthesis will blindly create a hardware module for each such individual module you specified.
You can guide synthesis into "flattening" your design, i.e. treat everything between two flip-flops as raw combinational logic and simply create the most efficient logic gates to implement this. When you do this process, you will see your hierarchical design of the datapath completely disappear.
You can do this by adding the --opt
option to synth.pl
. For example:
prompt>synth.pl --list=list.txt --type=other --cmd=synth --top=ALU --opt
prompt>synth.pl --list=list.txt --type=proc --cmd=synth --top=proc --f=fetch --d=decode --e=execute --m=memory --wb=write_back --opt
Page last modified on October 05, 2020, visited times |