Lab #3: Pipelining your processor
João Lázaro
Herbst 2005
In Lab 3, your group builds a pipeline processor like the one described in Chapter 6 of the COD. It will use Verilog to render the layout. The processor will run on simulation (ModelSim) and real hardware (Xilinx).
Lab 3 has several "checks" and a deadline:
- ThroughThursday, September 29 at 11:59 p.m, your group sends a preliminary draft document (Issue 0c) tocs152-personal@cory🇧🇷 Team ratings (issue 0b) are also due by 11:59 p.m. m. on Thursday, September 29 at the same address. Each team member sends separate emailsconfidentialPersonnel Evaluation Report.Please be sure to send staff reviews to the staff list, NOT the class list!
- OneFreitag 30.09In the lab area, your TA will review your design document with you and suggest changes. A final version of the document incorporating these changes should be emailed tocs152-st aff@corythroughWednesday, October 5 at 11:59 p.m(because semester I is on Tuesday evening). However, we recommend completing and submitting the project document well in advance of the deadline.
- OneFreitag 10/7In the lab section, you demonstrate the processor running on the Calinx board. During the demo, the TAs provide you with test code whose instruction sequence does not require the use of direct muxes for proper operation and whose load instructions assume the presence of a load delay slot. These programs test basic pipe operation, but do not test the "hard" pipe functions: how to stop and how to continue. In addition to these expectations, the semantics of all statements in the table shown in task 1a must be supported for this test. Referring to COD/3e Figure 6.36, the 'hazard reporting unit' and 'forwarding unit' do not need to be implemented for this test.
- OneFreitag 14.10In the lab section, you demonstrate the processor running on the Calinx board. Unlike the 10/7 check, this demo tests all facets of the processor (including the issues discussed in #4). During this demo, TA will give you a secret test code. If you pass these tests on the first try, you will get bonus points. You can also get bonus points for repairing your processor so that it passes the tests within your section's time. If your processor is not fixed by the end of the section, your TA will provide you with the source of the secret test code to use for your weekend debugging sessions.
- OneMonday, 10/17 at 11:59 p.mthe laboratory (incl. laboratory report) is due.
on departure days,be ready to demonstratethe moment your session begins! The earlier you start, the more time you have to recover from problems. Also make surefollow the specsin this report (and in the MIPS ISA manual in the Resources section). A working processor that does notcorrectto implementatof the instructions, the validation fails.
Laboratory Report Submission Guidelines: Run to submit your lab reportm:\bin\submit-fall2005.exe, or at the prompt, type "send-herbst2005.exe" Then follow the instructions. The required format for laboratory reports is shown on theresourcespage, as well as the desired format for your design notebook.
Laboratory 3 document history:
- 30/08Lab 3 published on the website.
Problem 0: Before the flight
Before your group begins designing, you must complete several preparatory tasks.
Issue 0a: Preliminary design document
Your group creates a preliminary draft document. The design document is 2-4 pages long and contains:
- the identity ofspeakerfor the lab and a list of group members. The speaker's responsibility is to direct questions to the TA and to respond to the TA's questions. Choose a different speaker than the one you had for Lab 2.
- A brief description of the project structure. The description is accompanied by preliminary high-level schematics of your datapath and a preliminary discussion of the driver.
- A description of the unit test suites and multi-unit test suites you plan to create for your processor, and a description of the machine language programs you plan to write to test the entire processor. also oneexperimental plan, using the epoch chart method shown in Lesson 9/8, which shows when you want to run each type of test.
- one trydivision of labour, which shows the tasks that each member of the group intends to carry out.
- Ö "ParanoiaSection ": Discuss possible problem areas in the laboratory. An early assessment of critical time paths for the project should be part of this section.
Please refer to the beginning of this document for the timelines associated with the preliminary draft document (preliminary submission and TA review). See Issue 0c for a description of the final design document.
Task 0b: Evaluation of the equipment for laboratory 2
Please rate yourself and the other members of your team so we can understand your team's performance.
Start with your self-assessment. Describe your personal work for Lab 2 and how well you think you did it. Also, include in your group's draft document what you originally committed to (we know initial plans may change and you may have done different work).
Then, based on your own observations, rate the performance of other members of your team during the last lab assignment. Don't judge yourself. Suppose an average team member received a score of 20 points. Above-average performers get more points, low performers get fewer points.
The maximum score for one person is 40 points. Each review should include a one or two sentence rationale.
Suppose you are part of a team of 5 with the following members: Sue Superstar, Teddy Tryhard, AnnieAverage and Ned Neverthere. Your distribution might look like this:
Name | score | Argumentation |
Sue superstars | 30 | Sue really helped the group. I figured out how to deal with pipeline crashes and spent a lot of time last week fixing the last bug. |
Teddy Tryhard | 13 | Teddy was always at our meetings, he had a very positive attitude and would do whatever the group asked of him. However, he often made mistakes and needed help. |
Annie's Media | 20 | Annie did a good job. |
Sun doesn't give | 5 | Ned never showed up for group meetings. We ended up re-implementing the one piece he gave us. |
You must re-evaluate your team members after each lab assignment and base your evaluation solely on their performance during that lab assignment.These results are used for ranking.Be honest and fair as you expect others to be.
Please refer to the timetable in front of the laboratory for when the exams are due. Please note that each team member sends a separate assessment report via email.
Task 0c: Design a notebook
As part of this lab, your group will maintain an online notebook. See the Teamwork Lesson Notes (and also the Lab 2 Essay) for detailed information on the notebook.
Problem 1: Submit your project
Problem 1a: Channeling.
Implement the following five-stage pipeline for your processor:
Find IF Statements:Access the command stack to run the command.
ID instruction decoding:Decodes the instruction and reads the operands from the register file. For branch instructions, calculate the address of the branch target instruction and compare the registers.
Run EX:bypassing other stages of the pipeline. One of the following activities may occur:
- For arithmetic instructions, the ALU or shifter executes the arithmetic or logical instructions.
- In order to load or store instructions, the data address is calculated.
WB writes back:Write the result to the log file.
Add the appropriate pipeline loggers to your single loop project.
This project requires you to use new RAM files for your instructions and data storage.Note that the Verilog files we refer to in this lab are all in them:\lab3\Directory.Start reading the readme atm:\lab3\Lab3Help.The new RAMs (named sdatamem.v and sinstrlem.v) aretotal synchron.This means that you have to configure the address and any data to be writtenbeforethe edge of the clock. Read and write operations are synchronous in this way. Keep this in mind as you work on your pipeline. One way to see the result is that some of the registers you see in the pipeline diagrams shown (e.g. the PC or S and D registers) are partially duplicated in the RAM block. This means, for example, that you still need to maintain a separate PC register, but you also need to route the address valuebeforethe PC registry for the actual RAM block; at the clock edge, the new address is synchronized with both the PC register and the internal address registers in RAM.
Implement the following instructions on your processor:
model | instructions |
arithmetic | addu, subu, adeus |
logically | e, andi, tu, ori, xor, xori, lui |
change | sll, sra, srl |
to compare | slt, slti, sltu, sltui |
check | sapo, bne, bgez, bltz, j, jr, jal |
data transfer | lw, sw |
Miscellaneous: | stop |
Note that unlike commercial implementations, exception handling is not implemented by your processor. Therefore, if an instruction other than those listed above appears in the instruction stream, what your processor does is not defined by this specification (a practical option is to treat undefined instructions as non-op).
Implement your data path in Verilog. Don't write your datapath like a giant Verilog module. Instead, first implement simpler modules as building block components (registers, comparators, etc.) and assemble these components to form your datapath cleanly. Use enough intermediate levels of modules to model the structure of the datapath, but don't use too many intermediate levels.
All clock inputs must be activated on the rising edge of the clock.
atControl statements must have a single delay interval (i.e. the next statement isforeverexecuted after the control declaration).
Be sure to look at the instruction bit fields like bgez in Appendix A of the COD. Note that the "rt" field is actually used to differentiate between bgez and bltz.
Note that your processor does not need to implement all of the ALU instructions in the MIPS instruction set, only the ALU instructions listed in the table above. As in Lab 2, there must be an arithmetic/logical multibit changer (31) outside the ALU. Statements like SLT must also be handled outside of the ALU. SLT must subtract the two operands and then use the ALU status flags (Null, Neg, Ovf) to calculate the output and then return the correct value to the destination register.
ÖstopThe guide is special. See COD for its bit field. Although this is typically an exception-generating statement, you should treat it more like a Stop statement.decrypted, the break statement should prevent the pipeline from continuing. This means that the PC will not advance further, the interrupt instruction will remain in the decode stage, and additional instructions will be removed from the pipeline as they complete. The correct terminology for this is that thestopthe instruction "hangs" in the decode phase. Suppose there is a single input signal called "enable" coming from the outside. If it's high, you need to release a lock.stopcommand exactly once (you need to build a small circuit that generates a single pulse from one clock cycle when the enable is high, then ignores the enable until it goes low again). When we assign our pipeline to the board (problem 2), the break statement will stop the pipeline and possibly display its code on the LEDs. In addition, we will have the option to "unfreeze" the pipeline with a debounce switch.
Your processor should produce an 8-bit STAT output signal: if the processor is not halted, STAT = 0. Otherwise, the high bit (bit 7) of STAT = 1 and the low 7 bits = the low 7 bits of STAT .break code (located in bits 6-25 of the break instruction). If this signal changes, make sure you have a monitor output that prints "STATUSChanged: 0xvalue" to the console.
Problem 1b: Memory Mapped I/O
Any processor is useless without I/O, and your processor is no exception. So you need to create onemapped memoryI/O module. All writes to addresses 0x80000000-0xFFFFFFFC are considered writes to the I/O space. All reads and writes to the I/O area must not be allocated to its data memory. Instead, these operations must be handled by your storage I/O module.
The specifications of the I/O module are as follows. It must have a 32-bit address, 32-bit data input and 32-bit data output to the processor and memory. You must also have 2 I/O buses: one 32-bit; input data bus and a 32-bit output data bus and a 1-bit output selector. Other control signals may also be required. This I/O module must have two 32-bit I/O registers internally.
The behavior is as follows: Reads and writes to and from 0xFFFFFFF0 go to a 32-bit register (name it DP0). It reads and writes to and from 0xFFFFFFF4 and goes to the other (name it DP1). Readings of 0xFFFFFFF8 come from the I/O bus. Writes to 0xFFFFFFF8 are ignored. Reads and writes to and from addresses 0x80000000-0xFFFFFFEC and 0xFFFFFFFC are ignored. The I/O bus output is DP0 when the output selector is 0 and DP1 when the selector is 1.
The I/O bus is connected to the DIP switches on the circuit board. The output I/O bus connects to the hex LEDs on the board.
| | |
0x80000000-0xFFFFFFEC | Reserved for future use. | Reserved for future use. |
0xFFFFFFFF0 | DP0 | DSB |
0xFFFFFFFF4 | DP1 | DP1 |
0xFFFFFFFF8 | input switch | anything |
0xFFFFFFFC | Reserved for future use. | Reserved for future use. |
Note that you can read/write the I/O range with normal loads and save with negative offsets:
lw$1, -8($0); Read input => $1
$7, -16 ($0); Write $7 to DP0
switch $8, -12($0); Write $8 to DP1
This works because the offsets are sign-extended. For example,-12 ($0)means address 0xFFFFFFF4.
Finally, in this module, you add unsynthesized code (see Synplify manual) that sends a message to the console each time a change is written (e.g. something like: "I/O Write to DP0: 0x44455523 "). This message should also be written to a file called "iooutput.trace". Also, when the module inputs a value, ensure that the value comes as the next value from an input file called ioinput.trace.
Problem 1c: Update the monitor module
Make sure to update your Lab 2 teardown monitoring module:
- Output to a file ("trace.txt") in addition to the visualization module. This makes reading, debugging, and searching easier.
- Add the new statements
- Be sure to consider channeling effects.
EXCinstruction= instruction;
MMistruction= instruction;
This way you will have the value of the command word when you are at the end of the save stage. It's like a mini pipe. You wouldn't need to keep the input register values for long, etc. Think about itgreat care...
Problem 1d: High-level module integration: chip mapping
As with Lab 2, you have a high-level schematic module that ties everything together. However, now several I/O pins are left: a clock network, 1 reset signal, 1 enable signal (for interrupt commands), 1 output select signal (from the I/O module), 1 8-bit output (from interrupt logic), 1 32-bit output (I/O) and 32-bit input (I/O).
Use the FPGA_TOP2.v module inm:\lab3as a high-level integration for your design. Note that FPGA_TOP2.v is a test suite made by TA for the TFTP module; You must replace some of your code with the processor code before boarding.
You can briefly read the description of the Calinx boards (see resource page) to see what the FPGA_TOP2.v pins mean. You will modify Verilog so that this top-level module integrates all the pieces you need.
You should assume that the following is true:
There are 2 sets of 4 buttons. We will only use Group 1 (although you can use the others if you wish).
- Switch 5 should be the RESET signal.
- Switch 8 must be used to enable the break statement.
- Switch 7 must be used to select the output of your I/O module.
- Switch4 gets a special signal "SINGLE_CLOCK". We will mention this below.
Since these are keys, they will naturally bounce, so you'll need to include the debounce module fromm:\lab3,same as in lab 2.
ÖstopThe instruction generates the 8-bit STAT signal. This should correspond to the 8 individual LEDs (on the side of the board).
The output of your memory mapped I/O module should go to the HEXleds. To enable them correctly you need either bin2Hex or ledtool modules.
Input to the I/O module must come from the first set of 8-bit DIP switches (switch 9). Suppose the value of these switches goes to the lower 8 bits of the input bus and the upper 24 bits are set to zero. HEX LED when a switch is activated. Note that the second switch in the second set of 8 DIP switches controls this. You can use other keys to specify what is displayed in addition to the normal I/O.
Finally, the CLOCK network for your pipeline needs to be connected to the XILINX board's clk DLL or its debounced SINGLE_CLOCK signal. 🇧🇷
procesador_reloj =CLK_SOURCE ? LAB_CLK: SINGLE_CLOCK;
Read the DLL documentation, available atExamples for M:\lab3\DLLDirectory for more information on using DLL files.FPGA_TOP2.vit is currently configured to receive a 9MHz clock. That can happen very quickly.
Task 1e: test plan
Write test benches to test your processor. These test suites include unit tests (for your processor components), multi-unit tests (for the datapath), and a set of machine language programs for full processor testing.
In order to test the full processor, the programs you write should be similar to the broken spim programs you wrote in Lab 1 (Hazards in Problem 4 below). Therefore, these programs should not use values immediately after generation.
In addition to testing the processor, some benchmarks also need to test the top-level module to debug the interface between the processor and the Calinx board. These banks make debugging easier when you first start working with your pipeline processor.
Build these testbed modules around the top-level FPGA module; The test suite should provide a clock (as in lab 2), output to the console when the I/O changes, and perhaps "press" the keys to test.
Note that it's difficult to counter the switches when interacting with the clock. Please think carefully before attempting to test the one-step clock function.
Also think carefully about I/O resources. How will you test them in ModelSim? To demonstrate your pipeline simulation, create a test module that detects whenstopwas declared, waits 10 cycles, then triggers the release line, printing to the console as it does so. This allows you to run programs in ModelSim that use I/O resources to generate output.
Task 2: Assignment to Calix
Map your processor layout to the Calinx board. This happens (at least) twice: once for the basic Xilinx processor scan and once for the full Xilinx processor scan. Some of the information in this issue applies only to the full analysis. Be sure to read the information inLab3Ayudaabout the changes to getting RAM versions assigned to Xilinx and using the TFTP interface.
Note that you must be able to put the processor in single-stepping mode (the first dip switch of the second set is set to zero). Split the statements in your code and debug the code that way. You should be able to loop at the end of your execution with a combination of interrupt commands and I/O writes (address at 0xFFFFFFF0, data at 0xFFFFFFF4) to dump the contents of your memory to the hex display when complete transfer. Make sure this works! For the first Xilinx check, you need to write a loop that doesn't require the unimplemented parts of the driver (forwarding and threat detection) to be present.
Make sure the RESET line causes the important state of the processor to be reset! Remember that the synthesizer ignores "leading" blocks in Verilog. Many errors can be introduced when data sets contain a random initial state! One obvious thing that needs to be restarted is the PC. Are there other things?
Don't try to debug everything at once. Start with very simple examples. Possibly distort the hex display to show PC information while debugging (you can distort other things as well). For example, how about a simple program withstoplike the first statement and a few nops. can you make it work How about simple I/O examples? Once you are confident that your simpler tests are working, you can move on to more complicated tests.
In your article, please provide information about the total number of FPGA slices used in your design and the proportion of the Xilinx part used in your design. This information should be available in the post-and-route log files.
Task 3: Dangers
Problem 3a: Hazard Management
Here is a list of the dangers you need to deal with:
- Data Risks: Data risks arise when the data produced by an instruction is used as operands in subsequent instructions.
- ALU is a source of data risks. Add the necessary forwarding logic and busses so that the result of an ALU operation can be used immediately by the next three instructions, without waiting for the data to be written to the register file.
- Load instructions also pose a data risk as the data is not available until the end of the MEM phase. The MIPS instruction set specifies that load instructions have a one cycle delay. This means that the compiler cannot generate code sequences that use a payload's data during the next cycle. Implement your datapath so that you have a load delay slot.
- Control Hazards: Branch instructions are a common instance of control hazards. Follow the MIPS instruction set definition and implement your processor to have a branch-delay slot that always executes regardless of the outcome of the branch.
- Structural Risks: Are there any in this project? If yes, please explain what they are and how you treat them. If no, why not?
ADD $1, $2, $3The ADD and SUB instructions have data risk, but there is an SLL between them. Be sure to check out these types of cases.
SLL $ 5, $ 6
Sub $ 6, $ 1, $ 7
Problem 3b: Intertwined Pipes/Intertwined Charges
Now that you've covered the basic dangers, you need to figure out how to deal with pipeline glitches. Your current processor handles the load delay slot the same as the original version of MIPS: if the compiler generates a code sequence in which a value is loaded from memory and used in the next instruction, you will get the wrong value on the next instruction. . Of course, the instruction specification specifically disallowed such code sequences; If no other options were available, the compiler would have to put a noop in the load delay slot to avoid getting an incorrect answer.
As a final exercise, introduce a pipeline such that a valueCan Iused by the compiler in the next cycle after being loaded from memory. This feature was added to later versions of the MIPS instruction set. To be clear, we want the following code sequence to do the "obvious", that is, the addition should use the value loaded from memory:
LW $ 1, 4 ($ 2)
ADD $2, $1, $3
Be sure to run the modelsim tests again to ensure the processor is still working properly. Your test bench should include tests that test several different distances between charges and their following values.Tip: The mechanism for this single cycle stop is very similar to what you need for thestopInstruction...
Problem 4: Pipeline Profit
Calculate the cycle time for your pipeline processor. To do this, you need to understand Xilinx's time analysis tools. In your essay, discuss what your critical path is and what steps you can take to reduce it. If we simply take our single loop processor from Lab 2 and add pipeline registers at key points, we would expect the cycle time to be the inverse of the delay in the longest block (ALU? Next PC? Memory?). Is that the achievement you achieved? Why or why not?
Last step: laboratory report
Submit a copy of your Verilog code (including testbeds), schematics, test suites, updated versions of your design document module and processor specs, and your online registrations. Explain any necessary changes to the specifications you submitted in your 10/5 design document.
Also provide simulation logs showing the correct operation of the processor. These registers should show the operations performed and then the contents of memory with the correct values. Also hand in your test bench documents.
As part of your letter, do a post-mortem for your test plan. Show error curves and give examples of the kind of bugs you found early on based on your test plan (as well as "escaped" bugs you found later than expected).
How much time did your team spend on this task?