Hakim Weatherspoon CS 3410 Computer Science Cornell University
|
|
- Lambert Chandler
- 6 years ago
- Views:
Transcription
1 Hakim Weatherspoon CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer.
2 memory inst register file alu PC new pc offset target imm control extend =? cmp addr d in d out memory
3 Advantages Single cycle per instruction make logic and clock simple Disadvantages Since instructions take different time to finish, memory and functional unit are not efficiently utilized Cycle time is the longest delay Load instruction Best possible CPI is 1 (actually < 1 w parallelism) However, lower MIPS and longer clock period (lower clock frequency); hence, lower performance
4 Advantages Better MIPS and smaller clock period (higher clock frequency) Hence, better performance than Single Cycle processor Disadvantages Higher CPI than single cycle processor Pipelining: Want better Performance want small CPI (close to 1) with high MIPS and short clock period (high clock frequency)
5 Parallelism Pipelining Both!
6 Alice Bob They don t always get along
7
8 Saw Drill Glue Paint
9 N pieces, each built following same sequence: Saw Drill Glue Paint
10 Alice owns the room Bob can enter when Alice is finished Repeat for remaining tasks No possibility for conflicts
11 time Latency: Elapsed Time for 4 Alice: hours/task 4 Throughput: Elapsed Time for 1 Bob: task/4 4 hrs Concurrency: Total elapsed time: 1 4*N Can we do better? CPI = 4
12 Partition room into stages of a pipeline Dave Carol Bob Alice One person owns a stage at a time 4 stages 4 people working simultaneously Everyone moves right in lockstep
13 Partition room into stages of a pipeline Alice One person owns a stage at a time 4 stages 4 people working simultaneously Everyone moves right in lockstep It still takes all four stages for one job to complete
14 Partition room into stages of a pipeline Bob Alice One person owns a stage at a time 4 stages 4 people working simultaneously Everyone moves right in lockstep It still takes all four stages for one job to complete
15 Partition room into stages of a pipeline Carol Bob Alice One person owns a stage at a time 4 stages 4 people working simultaneously Everyone moves right in lockstep It still takes all four stages for one job to complete
16 Partition room into stages of a pipeline Dave Carol Bob Alice One person owns a stage at a time 4 stages 4 people working simultaneously Everyone moves right in lockstep It still takes all four stages for one job to complete
17 Partition room into stages of a pipeline Alice Alice Alice Alice One person owns a stage at a time 4 stages 4 people working simultaneously Everyone moves right in lockstep It still takes all four stages for one job to complete
18 time Latency: 4 hrs/task Throughput: 1 task/hr Concurrency: 4 CPI = 1
19 Time What if drilling takes twice as long, but gluing and paint take ½ as long? Latency: Throughput: CPI =
20 Time Done: 4 cycles Done: 6 cycles Done: 8 cycles What if drilling takes twice as long, but gluing and paint take ½ as long? Latency: 4 cycles/task Throughput: 1 task/2 cycles CPI = CPI = 2
21 Principle: Throughput increased by parallel execution Balanced pipeline very important Else slowest stage dominates performance Pipelining: Identify pipeline stages Isolate stages from each other Resolve pipeline hazards (next lecture)
22 Single Cycle vs Pipelined Processor
23 Single-cycle insn0.fetch, dec, exec insn1.fetch, dec, exec Pipelined insn0.fetch insn0.dec insn1.fetch insn0.exec insn1.dec insn1.exec 23
24 5-stage Pipeline Implementation Working Example Hazards Structural Data Hazards Control Hazards 24
25 Review: Single cycle processor memory inst register file alu PC new pc offset target imm control extend =? cmp addr d in d out memory
26 memory inst register file alu +4 addr PC control d in d out memory new pc imm extend compute jump/branch targets Instruction Fetch Instruction Decode Execute Memory Write- Back
27 memory register file alu +4 addr PC new pc control extend compute jump/branch targets d in d out memory Fetch Decode Execute Memory WB
28 memory PC +4 new pc inst register file control extend imm B A alu compute jump/branch targets B D addr d in d out memory D M Instruction Fetch Instruction Decode ctrl Execute ctrl Memory ctrl Write- Back IF/ID ID/EX EX/MEM MEM/WB
29 Cycle add nand lw add sw IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB Latency: 5 cycles Throughput: 1 insn/cycle Concurrency: 5 CPI = 1 29
30 Break datapath into multiple cycles (here 5) Parallel execution increases throughput Balanced pipeline very important Slowest stage determines clock rate Imbalance kills performance Add pipeline registers (flip-flops) for isolation Each stage begins by reading values from latch Each stage ends by writing values to latch Resolve hazards 30
31 memory PC +4 new pc inst register file control extend imm B A alu compute jump/branch targets B D addr d in d out memory D M Instruction Fetch Instruction Decode ctrl Execute ctrl Memory ctrl Write- Back IF/ID ID/EX EX/MEM MEM/WB
32 Stage Fetch Decode Execute Memory Perform Functionality Use PC to index Program Memory, increment PC Decode instruction, generate control signals, read register file Perform ALU operation Compute targets (PC+4+offset, etc.) in case this is a branch, decide if branch taken Perform load/store if needed, address is ALU result Latch values of interest Instruction bits (to be decoded) PC + 4 (to compute branch targets) Control information, Rd index, immediates, offsets, register values (Ra, Rb), PC+4 (to compute branch targets) Control information, Rd index, etc. Result of ALU operation, value in case this is a store instruction Control information, Rd index, etc. Result of load, pass result from execute Writeback Select value, write to register file 32
33 instruction memory addr mc +4 PC new pc - PC+4 - pc-rel (PC-relative); e.g. BEQ, BNE - pc-abs (PC absolute); e.g. J and JAL. (PC+4) target 00 - pc-reg (PC registers); e.g. JR
34 instruction memory addr mc PC = read word inst PC+4 Rest of pipeline new pc pc-reg pc-relpc-abs pc-sel IF/ID
35 instruction memory addr mc PC = read word inst PC+4 Rest of pipeline pc-reg pc-relpc-abs PC+4 pc-reg (PC registers: JR) pc-rel (PC-relative: BEQ, BNE) pc-abs (PC absolute: J and JAL) pc-sel IF/ID 36
36 Stage 1: Instruction Fetch inst ctrl PC+4 WE Rd D register file A B Ra Rb B A imm PC+4 Rest of pipeline IF/ID ID/EX
37 result Stage 1: Instruction Fetch inst ctrl PC+4 decode WE Rd D register file A B Ra Rb dest extend B A imm PC+4 Rest of pipeline IF/ID ID/EX
38 ID/EX EX/MEM ctrl Stage 2: Instruction Decode ctrl PC+4 imm B A B D Rest of pipeline alu target
39 ID/EX EX/MEM pcreg pcsel branch? Stage 2: Instruction Decode PC+4 imm B A pcrel pcabs B D ctrl Rest of pipeline ctrl + alu target
40 Stage 3: Execute B D target d in addr memory d out mc ctrl M D Rest of pipeline ctrl EX/MEM MEM/WB
41 pcsel pcreg branch? Stage 3: Execute B D target pcrel pcabs d in addr ctrl M D Rest of pipeline ctrl memory d out mc EX/MEM MEM/WB
42 ctrl Stage 4: Memory M D MEM/WB
43 result Stage 4: Memory ctrl M D dest MEM/WB
44 inst mem PC +4 inst PC+4 Rd A D B Ra Rb B A Rt Rd PC+4 imm OP B D OP Rd addr d in d out mem M D OP Rd IF/ID ID/EX EX/MEM MEM/WB 49
45 Consider a non-pipelined processor with clock period C (e.g., 50 ns). If you divide the processor into N stages (e.g., 5), your new clock period will be: A. C B. N C. less than C/N D. C/N E. greater than C/N 50
46 Consider a non-pipelined processor with clock period C (e.g., 50 ns). If you divide the processor into N stages (e.g., 5), your new clock period will be: A. C B. N C. less than C/N D. C/N E. greater than C/N 51
47 Pipelining is a powerful technique to mask latencies and increase throughput Logically, instructions execute one at a time Physically, instructions execute in parallel Instruction level parallelism Abstraction promotes decoupling Interface (ISA) vs. implementation (Pipeline)
48 Instructions same length 32 bits, easy to fetch and then decode 3 types of instruction formats Easy to route bits between stages Can read a register source before even knowing what the instruction is Memory access through lw and sw only Access memory after ALU 53
49 5-stage Pipeline Implementation Working Example Hazards Structural Data Hazards Control Hazards 54
50 add nand lw add sw r3 r1, r2 r6 r4, r5 r4 20(r2) r5 r2, r5 r7 12(r3) Assume 8-register machine 55
51 M U X PC 4 + PC+4 instruction rega regb Register file Bits Bits Bits R0 R1 R2 R3 R4 R5 R6 R7 0 extend PC+4 vala valb imm Rd Rt op M U X M U X A L U target ALU result valb dest op Data mem ALU result mdata IF/ID ID/EX EX/MEM MEM/WB dest op M U X data dest
52 At time 1, Fetch add r3 r1 r2 4 0 extend data dest 0 0 M U X Time: 0 IF/ID ID/EX EX/MEM MEM/WB
53 add M U X 8 PC 4 4 Fetch: add Time: 1/ add Register file Bits Bits Bits R0 R1 R2 R3 R4 R5 R6 R extend / / 36 / / 03 / 02 / add nop M U X M U X A L U nop Data mem IF/ID ID/EX EX/MEM MEM/WB nop M U X data dest
54 nand add M U X 12 PC 8 4 Fetch: nand Time: 2/ nand Register file Bits Bits Bits R0 R1 R2 R3 R4 R5 R6 R extend / / 18 / add 36 9 M U X A L U / 36 M 3 U / 25 / 0 3 / nand X 0 / 0 45 / add nop Data mem IF/ID ID/EX EX/MEM MEM/WB / 0 4 / nop M U X data dest
55 lw 4 20(2) nand add M U X PC 12 4 Fetch: lw 4 20(2) Time: 3/ lw 4 20(2) 4 5 Register file Bits Bits Bits R0 R1 R2 R3 R4 R5 R6 R extend nand nand (18 7) 18 = = = / 18 / 9 7 M U X M U X A L U add Data mem / 0 45 IF/ID ID/EX EX/MEM MEM/WB / 4 8 / -3 / 9 7 / 3 6 / nand 0 / 0 3 / add nop M U X data dest
56 add lw 4 20(2) nand add M U X 20 PC 16 4 Fetch: add Time: add Register file Bits Bits Bits R0 R1 R2 R3 R4 R5 R6 R extend lw M U X M U X A L U nand 45 3 Data mem IF/ID ID/EX EX/MEM MEM/WB add M U X data dest
57 sw 7 12(3) add lw 4 20 (2) nand add M U X 24 PC 20 4 Fetch: sw 7 12(3) Time: sw 7 12(3) 2 5 Register file Bits Bits Bits R0 R1 R2 R3 R4 R5 R6 R extend add M U X M U X A L U lw -3 6 Data mem nand IF/ID ID/EX EX/MEM MEM/WB 45 3 M U X data dest
58 sw 7 12(3) add lw 4 20(2) nand M U X 28 PC 24 4 No more instructions Time: Register file Bits Bits Bits R0 R1 R2 R3 R4 R5 R6 R extend sw M U X M U X A L U add 29 4 Data mem IF/ID ID/EX EX/MEM MEM/WB lw -3 6 M U X data dest
59 nop nop sw 7 12(3) add lw 4 20(2) M U X 32 PC 28 4 No more instructions Time: 7 + Register file Bits Bits Bits R0 R1 R2 R3 R4 R5 R6 R extend M U X M U X A L U sw Data mem IF/ID ID/EX EX/MEM MEM/WB add 99 4 M U X data dest
60 nop nop nop sw 7 12(3) add M U X 36 PC Register file R0 R1 R2 R3 R4 R5 R6 R extend M U X A L U Data mem M U X data dest No more instructions Time: 8 Bits Bits Bits M U X IF/ID ID/EX EX/MEM MEM/WB 7 sw 5 Slides thanks to Sally McKee
61 nop nop nop nop sw 7 12(3) M U X 40 PC Register file R0 R1 R2 R3 R4 R5 R6 R extend M U X A L U Data mem M U X data dest No more instructions Bits Bits Bits M U X Time: 9 IF/ID ID/EX EX/MEM MEM/WB
62 Pipelining is great because: A. You can fetch and decode the same instruction at the same time. B. You can fetch two instructions at the same time. C. You can fetch one instruction while decoding another. D. Instructions only need to visit the pipeline stages that they require. E. C and D 67
63 Pipelining is great because: A. You can fetch and decode the same instruction at the same time. B. You can fetch two instructions at the same time. C. You can fetch one instruction while decoding another. D. Instructions only need to visit the pipeline stages that they require. E. C and D 68
64 memory PC +4 new pc inst register file control extend imm B A alu compute jump/branch targets B D addr d in d out memory D M Instruction Fetch Instruction Decode ctrl Execute ctrl Memory ctrl Write- Back IF/ID ID/EX EX/MEM MEM/WB
65 5-stage Pipeline Implementation Working Example Hazards Structural Data Hazards Control Hazards 70
66 Correctness problems associated w/processor design 1. Structural hazards Same resource needed for different purposes at the same time (Possible: ALU, Register File, Memory) 2. Data hazards Instruction output needed before it s available 3. Control hazards Next instruction PC unknown at time of Fetch 71
67 Dependence: relationship between two insns Data: two insns use same storage location Control: 1 insn affects whether another executes at all Not a bad thing, programs would be boring otherwise Enforced by making older insn go before younger one Happens naturally in single-/multi-cycle designs But not in a pipeline Hazard: dependence & possibility of wrong insn order Effects of wrong insn order cannot be externally visible Hazards are a bad thing: most solutions either complicate the hardware or reduce performance 72
68 Data Hazards register file (RF) reads occur in stage 2 (ID) RF writes occur in stage 5 (WB) RF written in ½ half, read in second ½ half of cycle x10: add r3 r1, r2 x14: sub r5 r3, r4 1. Is there a dependence? 2. Is there a hazard? A) Yes B) No C) Cannot tell with the information given. 73
69 Data Hazards register file (RF) reads occur in stage 2 (ID) RF writes occur in stage 5 (WB) RF written in ½ half, read in second ½ half of cycle x10: add r3 r1, r2 x14: sub r5 r3, r4 1. Is there a dependence? 2. Is there a hazard? A) Yes for both B) No C) Cannot tell with the information given. 74
70 Which of the following statements is true? A. Whether there is a data dependence between two instructions depends on the machine the program is running on. B. Whether there is a data hazard between two instructions depends on the machine the program is running on. C. Both A & B D. Neither A nor B 75
71 Which of the following statements is true? A. Whether there is a data dependence between two instructions depends on the machine the program is running on. B. Whether there is a data hazard between two instructions depends on the machine the program is running on. C. Both A & B D. Neither A nor B 76
72 time Clock cycle add r3, r1, r2 IF ID MEM WB sub r5, r3, r4 IF ID MEM WB lw r6, 4(r3) IF ID MEM WB or r5, r3, r5 IF ID MEM WB sw r6, 12(r3) IF ID MEM WB
73 add r3, r1, r2 sub r5, r3, r4 lw r6, 4(r3) or r5, r3, r5 How many data hazards due to r3 only A) 1 B) 2 C) 3 D) 4 E) 5 sw r6, 12(r3)
74 time Clock cycle backwards arrows require time travel add r3, r1, r2 IF ID X MEM WB sub r5, r3, r4 IF ID X MEM WB lw r6, 4(r3) IF ID X MEM WB or r5, r3, r5 IF ID X MEM WB sw r6, 12(r3) IF ID X MEM WB 79
75 time Clock cycle add r3, r1, r2 IF ID X MEM WB sub r5, r3, r4 IF ID X MEM WB lw r6, 4(r3) IF ID X MEM WB or r5, r3, r5 IF ID X MEM WB sw r6, 12(r3) IF ID X MEM WB 80
76 time Clock cycle add r3, r1, r2 IF ID X MEM WB sub r5, r3, r4 IF ID X MEM WB lw r6, 4(r3) IF ID X MEM WB or r5, r3, r5 IF ID X MEM WB sw r6, 12(r3) IF ID X MEM WB 81
77 Data Hazards register file reads occur in stage 2 (ID) register file writes occur in stage 5 (WB) next instructions may read values about to be written i.e. add r3, r1, r2 sub r5, r3, r4 How to detect?
78 Detecting Data Hazards inst mem PC +4 inst PC+4 IF/ID Rd D A B Ra Rb IF/ID.Ra 0 && (IF/ID.Ra==ID/Ex.Rd IF/ID.Ra==Ex/M.Rd IF/ID.Ra==M/W.Rd) sub r5,r3,r4 OP Rt Rd PC+4 imm B A add r3, r1, r2 B D OP Rd d in addr d out mem M D OP Rd ID/EX EX/MEM MEM/WB
79 Detecting Data Hazards inst mem PC +4 inst PC+4 Rd D A B Ra Rb detect hazard OP Rt Rd PC+4 imm B A B D OP Rd addr d in d out mem M D OP Rd IF/ID ID/EX EX/MEM MEM/WB
80 Data hazards occur when a operand (register) depends on the result of a previous instruction that may not be computed yet. A pipelined processor needs to detect data hazards.
81 What to do if data hazard detected?
82 What to do if data hazard detected? A) Wait/Stall B) Reorder in Software (SW) C) Forward/Bypass D) All the above E) None. We will use some other method
83 1. Do Nothing Change the ISA to match implementation Hey compiler: don t create code w/data hazards! (We can do better than this) 2. Stall Pause current and subsequent instructions till safe 3. Forward/bypass Forward data value to where it is needed (Only works if value actually exists already) 89
84 How to stall an instruction in ID stage prevent IF/ID pipeline register update stalls the ID stage instruction convert ID stage instr into nop for later stages innocuous bubble passes through pipeline prevent PC update stalls the next (IF stage) instruction
85 add r3, r1, r2 sub inst r5, r3, r5 or r6, r3, r4 mem add r6, r3, r8 PC +4 WE=0 inst PC+4 IF/ID Rd D A B Ra Rb detect hazard If detect hazard MemWr=0 RegWr=0 OP Rt Rd PC+4 imm B A B D OP Rd d in addr d out mem M D OP Rd ID/EX EX/MEM MEM/WB
86 time Clock cycle add r3, r1, r2 sub r5, r3, r5 or r6, r3, r4 add r6, r3, r8
87 time r3 = 10 add r3, r1, r2 r3 = 20 sub r5, r3, r5 Clock cycle IF ID Ex M W 3 Stalls IF ID ID ID ID Ex M W or r6, r3, r4 IF IF IF IF ID Ex M add r6, r3, r8 IF ID Ex
88 inst mem +4 inst D rd ra rb A B A B D B data mem D M PC (MemWr=0 RegWr=0) nop Op WE Rd Op WE Rd Op WE Rd sub r5,r3,r5 add r3,r1,r2 or r6,r3,r4 (WE=0) /stall NOP = If(IF/ID.rA 0 && (IF/ID.rA==ID/Ex.Rd IF/ID.rA==Ex/M.Rd IF/ID.rA==M/W.Rd)) STALL CONDITION MET
89 inst mem PC +4 or r6,r3,r4 inst D rd ra rb A B (MemWr=0 RegWr=0) nop sub r5,r3,r5 (WE=0) /stall A B Rd WE Op NOP = If(IF/ID.rA 0 && (IF/ID.rA==ID/Ex.Rd IF/ID.rA==Ex/M.Rd IF/ID.rA==M/W.Rd)) (MemWr=0 RegWr=0) nop D B Rd WE Op data mem add r3,r1,r2 STALL CONDITION MET D M Rd WE Op
90 inst mem +4 inst D rd ra rb A B A B D B data mem D M PC (MemWr=0 RegWr=0) nop Rd WE Op (MemWr=0 RegWr=0) Rd WE Op (MemWr=0 RegWr=0) sub r5,r3,r5 nop nop Rd WE Op add r3,r1,r2 or r6,r3,r4 (WE=0) /stall NOP = If(IF/ID.rA 0 && (IF/ID.rA==ID/Ex.Rd IF/ID.rA==Ex/M.Rd IF/ID.rA==M/W.Rd)) STALL CONDITION MET
91 time r3 = 10 add r3, r1, r2 r3 = 20 sub r5, r3, r5 Clock cycle IF ID Ex M W 3 Stalls IF ID ID ID ID Ex M W or r6, r3, r4 IF IF IF IF ID Ex M add r6, r3, r8 IF ID Ex
92 How to stall an instruction in ID stage prevent IF/ID pipeline register update stalls the ID stage instruction convert ID stage instr into nop for later stages innocuous bubble passes through pipeline prevent PC update stalls the next (IF stage) instruction
93 Data hazards occur when a operand (register) depends on the result of a previous instruction that may not be computed yet. A pipelined processor needs to detect data hazards. Stalling, preventing a dependent instruction from advancing, is one way to resolve data hazards. Stalling introduces NOPs ( bubbles ) into a pipeline. Introduce NOPs by (1) preventing the PC from updating, (2) preventing writes to IF/ID registers from changing, and (3) preventing writes to memory and register file. *Bubbles in pipeline significantly decrease performance.
94 1. Do Nothing Change the ISA to match implementation Compiler: don t create code with data hazards! (Nice try, we can do better than this) 2. Stall Pause current and subsequent instructions till safe 3. Forward/bypass Forward data value to where it is needed (Only works if value actually exists already) 100
95 Forwarding bypasses some pipelined stages forwarding a result to a dependent instruction operand (register). Three types of forwarding/bypass Forwarding from Ex/Mem registers to Ex stage (M Ex) Forwarding from Mem/WB register to Ex stage (W Ex) RegisterFile Bypass
96 inst mem D A B A B imm D B data mem D M detect hazard Rb Ra forward unit Rd WE MC Rd WE MC IF/ID ID/Ex Ex/Mem Mem/WB 102
97 inst mem D A B A B imm D B data mem D M detect hazard Rb Ra forward unit Rd WE MC Rd WE MC IF/ID ID/Ex Ex/Mem Mem/WB Three types of forwarding/bypass Forwarding from Ex/Mem registers to Ex stage (M Ex) Forwarding from Mem/WB register to Ex stage (W Ex) RegisterFile Bypass 103
98 Ex/Mem inst mem D A B data mem sub r5, r3, r1 add r3, r1, r2 add r3, r1, r2 sub r5, r3, r1 IF ID Ex M W IF ID Ex M W Problem: EX needs ALU result that is in MEM stage Solution: add a bypass from EX/MEM.D to start of EX 104
99 Ex/Mem A inst mem D B data mem sub r5, r3, r1 add r3, r1, r2 Detection Logic in Ex Stage: forward = (Ex/M.WE && EX/M.Rd!= 0 && ID/Ex.Ra == Ex/M.Rd) (same for Rb) 105
100 Mem/WB A inst mem D B data mem add r3, r1, r2 sub r5, r3, r1 or r6, r3, r4 IF ID Ex M W IF or r6, r3, r4 ID IF Ex M W ID Ex M sub r5, r3, r1 W add r3, r1,r2 Problem: EX needs value being written by WB Solution: Add bypass from WB final value to start of EX 106
101 Mem/WB inst mem D A B data mem or r6, r3, r4 sub r5, r3, r1 Detection Logic: forward = (M/WB.WE && M/WB.Rd!= 0 && ID/Ex.Ra == M/WB.Rd && not (ID/Ex.WE && Ex/M.Rd!= 0 && ID/Ex.Ra == Ex/M.Rd) (same for Rb) add r3, r1,r2 107
102 A inst mem D B data mem add r6, r3, r8 or r6, r3, r4 sub r5, r3, r1 add r3, r1,r2 Problem: Reading a value that is currently being written Solution: just negate register file clock writes happen at end of first half of each clock cycle reads happen during second half of each clock cycle
103 inst mem D A B data mem add r6, r3, r8 or r6, r3, r4 sub r5, r3, r1 add r3, r1,r2 add r3, r1, r2 IF ID Ex M W sub r5, r3, r1 or r6, r3, r4 add r6, r3, r8 IF ID IF Ex M W ID Ex M W IF ID Ex M W
104 5-stage Pipeline Implementation Working Example Hazards Structural Data Hazards Control Hazards 110
105 time Clock cycle add r3, r1, r2 sub r5, r3, r4 lw r6, 4(r3) or r5, r3, r5 sw r6, 12(r3) 111
106 time add r3, r1, r2 Clock cycle IF ID Ex M W sub r5, r3, r4 IF ID Ex M W lw r6, 4(r3) IF ID Ex M W or r5, r3, r6 IF ID Ex M W sw r6, 12(r3) IF ID Ex M W
107 time add r3, r1, r2 Clock cycle IF ID Ex M W backwards arrows require time travel sub r5, r3, r4 IF ID Ex M W lw r6, 4(r3) IF ID Ex M W or r5, r3, r5 IF ID Ex M W sw r6, 12(r3) IF ID Ex M W
108 A inst mem D B data mem or r6, r3, r4 lw r4, 20(r8) Data dependency after a load instruction: Value not available until after the M stage Next instruction cannot proceed if dependent THE KILLER HAZARD 114
109 inst mem A D B or r6,r4,r1 lw r4, 20(r8) data mem lw r4, 20(r8) or r6, r3, r4 115
110 inst mem A D B or r6,r4,r1 lw r4, 20(r8) data mem lw r4, 20(r8) IF ID Ex or r6, r3, r4 IF ID 116
111 inst mem A D B or r6,r4,r1 NOP data mem lw r4, 20(r8) lw r4, 20(r8) or r6, r3, r4 IF ID Ex M W Stall IF ID* ID Ex M W 117
112 inst mem D A B data mem or r6,r4,r1 NOP lw r4, 20( lw r4, 20(r8) or r6, r3, r4 IF ID Ex M W Stall IF ID* ID Ex Ex M W 118
113 inst mem D A B A B imm D B data mem D M detect hazard Rd Rb Ra MC forward unit IF/ID ID/Ex Ex/Mem Mem/WB Rd WE MC Rd WE MC Stall = If(ID/Ex.MemRead && IF/ID.Ra == ID/Ex.Rd 119
114 inst mem D A B A B imm D B data mem D M detect hazard Rd Rb Ra MC forward unit IF/ID ID/Ex Ex/Mem Mem/WB Rd WE MC Rd WE MC Most frequent 3410 non-solution to load-use hazards Why is this solution so so so so so so awful? 120
115 Forwarding values directly from Memory to the Execute stage without storing them in a register first: A. Does not remove the need to stall. B. Adds one too many possible inputs to the ALU. C. Will cause the pipeline register to have the wrong value. D. Halves the frequency of the processor. E. Both A & D 121
116 Forwarding values directly from Memory to the Execute stage without storing them in a register first: A. Does not remove the need to stall. B. Adds one too many possible inputs to the ALU. C. Will cause the pipeline register to have the wrong value. D. Halves the frequency of the processor. E. Both A & D 122
117 Two MIPS Solutions: MIPS 2000/3000: delay slot ISA says results of loads are not available until one cycle later Assembler inserts nop, or reorders to fill delay slot MIPS 4000 onwards: stall But really, programmer/compiler reorders to avoid stalling in the load delay slot 123
118 Data hazards occur when a operand (register) depends on the result of a previous instruction that may not be computed yet. A pipelined processor needs to detect data hazards. Stalling, preventing a dependent instruction from advancing, is one way to resolve data hazards. Stalling introduces NOPs ( bubbles ) into a pipeline. Introduce NOPs by (1) preventing the PC from updating, (2) preventing writes to IF/ID registers from changing, and (3) preventing writes to memory and register file. Bubbles (nops) in pipeline significantly decrease performance. Forwarding bypasses some pipelined stages forwarding a result to a dependent instruction operand (register). Better performance than stalling.
119 Find all hazards, and say how they are resolved: add r3, r1, r2 nand r5, r3, r4 add r2, r6, r3 lw r6, 24(r3) sw r6, 12(r2)
120 Find all hazards, and say how they are resolved: add r3, r1, r2 nand r5, r3, r4 add r2, r6, r3 lw r6, 24(r3) sw r6, 12(r2) 5 Hazards
121 Find all hazards, and say how they are resolved: add r3, r1, r2 nand r5, r3, r4 add r2, r6, r3 lw r6, 24(r3) sw r6, 12(r2) Forwarding from Ex/M ID/Ex (M Ex) Forwarding from M/W ID/Ex (W Ex) RegisterFile (RF) Bypass Forwarding from M/W ID/Ex (W Ex) Stall + Forwarding from M/W ID/Ex (W Ex) 5 Hazards
122 Find all hazards, and say how they are resolved: add r3, r1, r2 sub r3, r2, r1 nand r4, r3, r1 or r0, r3, r4 xor r1, r4, r3 sb r4, 1(r0)
123 Find all hazards, and say how they are resolved: add r3, r1, r2 sub r3, r2, r1 nand r4, r3, r1 or r0, r3, r4 xor r1, r4, r3 sb r4, 1(r0) Hours and hours of debugging!
124 Delay Slot(s) Modify ISA to match implementation Stall Pause current and all subsequent instructions Forward/Bypass Try to steal correct value from elsewhere in pipeline Otherwise, fall back to stalling or require a delay slot Tradeoffs?
125 5-stage Pipeline Implementation Working Example Hazards Structural Data Hazards Control Hazards 131
126 i = 0; do { n += 2; i++; } while(i < max) i = 7; n--; i r1 Assume: n r2 max r3 x10 addiu r1, r0, 0 # i=0 x14 Loop: addiu r2, r2, 2 # n += 2 x18 addiu r1, r1, 1 # i++ x1c blt r1, r3, Loop # i<max? x20 addiu r1, r0, 7 # i = 7 x24 subi r2, r2, 1 # n-- 132
127 Control Hazards instructions are fetched in stage 1 (IF) branch and jump decisions occur in stage 3 (EX) next PC not known until 2 cycles after branch/jump x1c blt r1, r3, Loop x20 addiu r1, r0, 7 x24 subi r2, r2, 1 Branch not taken? No Problem! Branch taken? Just fetched 2 addi s Zap & Flush 133
128 prevent PC update clear IF/ID latch branch continues inst mem PC +4 D A B New PC = 14 1C blt r1,r3,l 20 addiu r1,r0,7 24 subi r2,r2,1 14 L:addi r2,r2,2 branch calc decide branch data mem If branch Taken Zap IF ID Ex M W IF ID NOP NOP NOP IF NOP NOP NOP NOP IF ID Ex M W 134
129 prevent PC update clear IF/ID latch branch continues inst mem PC +4 D A B New PC = 1C 1C blt r1,r3,l 20 addiu r1,r0,7 24 subi r2,r2,1 14 L:addi r2,r2,2 branch calc decide branch data mem If branch Taken Zap IF ID Ex M W IF ID NOP NOP NOP IF NOP NOP NOP NOP For every taken branch? OUCH!!! IF ID Ex M W 135
130 1. Delay Slot You MUST do this MIPS ISA: 1 insn after ctrl insn always executed Whether branch taken or not 2. Resolve Branch at Decode Some groups do this for Project 3, your choice Move branch calc from EX to ID Alternative: just zap 2 nd instruction when branch taken 3. Branch Prediction Not in 3410, but every processor worth anything does this (no offense!) 136
131 inst mem PC +4 D A B New PC = 1C branch calc decide branch 1C blt r1, r3, Loop F D X 20 addiu r1, r0, 7 F D 24 subi r2, r2, 1 F data mem If branch Taken Zap
132 i = 0; do { n += 2; i++; } while(i < max) i = 7; n--; x10 addiu r1, r0, 0 # i=0 i r1 Assume: n r2 max r3 x14 Loop: addiu r2, r2, 2 # n += 2 x18 addiu r1, r1, 1 # i++ x1c blt r1, r3, Loop # i<max? x20 nop x24 addiu r1, r0, 7 # i = 7 x28 subi r2, r2, 1 # n++ 138
133 inst mem PC +4 D A B New PC = 1C branch calc decide branch 1C blt r1, r3, Loop F D X 20 nop F D 24 addiu r1, r0, 7 F data mem If branch Taken Zap
134 inst mem PC +4 D A B branch calc decide data mem New PC = 1C branch 1C blt r1, r3, Loop F D X 20 nop F D 14 Loop:addiu r2,r2,2 F If branch Taken No Zapping 140
135 x10 addiu r1, r0, 0 # i=0 x14 Loop: addiu r2, r2, 2 # n += 2 x18 addiu r1, r1, 1 # i++ x1c blt r1, r3, Loop # i<max? x20 nop Compiler transforms code x10 addiu r1, r0, 0 # i=0 x14 Loop: addiu r1, r1, 1 # i++ x18 blt r1, r3, Loop # i<max? x1c addiu r2, r2, 2 # n += 2 141
136 inst mem PC +4 D A B branch calc decide data mem New PC = 1C branch 1C blt r1, r3, Loop F D X 20 addi r2,r2,2 F D 14 Loop:addi r1,r1,1 F Note: Insn in delay slot will always be executed whether branch take or not 142
137 Most processor support Speculative Execution Guess direction of the branch Allow instructions to move through pipeline Zap them later if guess turns out to be wrong A must for long pipelines 143
138 Pipeline so far Guess (predict) that the branch will not be taken We can do better! Make prediction based on last branch Predict take branch if last branch taken Or Predict do not take branch if last branch not taken Need one bit to keep track of last branch
139 What is accuracy of branch predictor? Wrong twice per loop! Once on loop enter and exit We can do better with 2 bits While (r3 0) {. r3--;} Top: BEQZ r3, End J Top End: While (r3 0) {. r3--;} Top2: BEQZ r3, End2 End2: J Top
140 Branch Not Taken (NT) Predict Taken 2 (PT2) Predict Taken 1 (PT1) Branch Taken (T) Branch Taken (T) Branch Not Taken (NT) Branch Taken (T) Predict Not Taken 2 (PT2) Predict Not Taken 1 (PT1) Branch Not Taken (NT)
141 Control hazards Is branch taken or not? Performance penalty: stall and flush Reduce cost of control hazards Move branch decision from Ex to ID 2 nops to 1 nop Delay slot Compiler puts useful work in delay slot. ISA level. Branch prediction Correct. Great! Wrong. Flush pipeline. Performance penalty
142 Data hazards Control hazards Structural hazards resource contention so far: impossible because of ISA and pipeline design
143 Data hazards register file reads occur in stage 2 (IF) register file writes occur in stage 5 (WB) next instructions may read values soon to be written Control hazards branch instruction may change the PC in stage 3 (EX) next instructions have already started executing Structural hazards resource contention so far: impossible because of ISA and pipeline design
144 Data hazards occur when a operand (register) depends on the result of a previous instruction that may not be computed yet. Pipelined processors need to detect data hazards. Stalling, preventing a dependent instruction from advancing, is one way to resolve data hazards. Stalling introduces NOPs ( bubbles ) into a pipeline. Introduce NOPs by (1) preventing the PC from updating, (2) preventing writes to IF/ID registers from changing, and (3) preventing writes to memory and register file. Nops significantly decrease performance. Forwarding bypasses some pipelined stages forwarding a result to a dependent instruction operand (register). Better performance than stalling. 150
145 Control hazards occur because the PC following a control instruction is not known until control instruction is executed. If branch is taken need to zap instructions. 1 cycle performance penalty. Delay Slots can potentially increase performance due to control hazards. The instruction in the delay slot will always be executed. Requires software (compiler) to make use of delay slot. Put nop in delay slot if not able to put useful instruction in delay slot. We can reduce cost of a control hazard by moving branch decision and calculation from Ex stage to ID stage. With a delay slot, this removes the need to flush instructions on taken branches. 151
Anne Bracy CS 3410 Computer Science Cornell University. [K. Bala, A. Bracy, S. McKee, E. Sirer, H. Weatherspoon]
Anne Bracy CS 3410 Computer Science Cornell University [K. Bala, A. Bracy, S. McKee, E. Sirer, H. Weatherspoon] Prog. Mem PC +4 inst Reg. File 5 5 5 control ALU Data Mem Fetch Decode Execute Memory WB
More informationPipeline Hazards. See P&H Chapter 4.7. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University
Pipeline Hazards See P&H Chapter 4.7 Hakim Weatherspoon CS 341, Spring 213 Computer Science Cornell niversity Goals for Today Data Hazards Revisit Pipelined Processors Data dependencies Problem, detection,
More informationPipeline Hazards. See P&H Chapter 4.7. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University
Pipeline Hazards See P&H Chapter 4.7 Hakim Weatherspoon CS 341, Spring 213 Computer Science Cornell niversity Goals for Today Data Hazards Revisit Pipelined Processors Data dependencies Problem, detection,
More informationPipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold
Pipelining Readings: 4.5-4.8 Example: Doing the laundry Ann, Brian, Cathy, & Dave A B C D each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes Folder takes
More informationLecture 14: Instruction Level Parallelism
Lecture 14: Instruction Level Parallelism Last time Pipelining in the real world Today Control hazards Other pipelines Take QUIZ 10 over P&H 4.10-15, before 11:59pm today Homework 5 due Thursday March
More informationAdvanced Superscalar Architectures. Speculative and Out-of-Order Execution
6.823, L16--1 Advanced Superscalar Architectures Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Speculative and Out-of-Order Execution Branch Prediction kill kill Branch
More informationCS152: Computer Architecture and Engineering Introduction to Pipelining. October 22, 1997 Dave Patterson (http.cs.berkeley.
CS152: Computer Architecture and Engineering Introduction to Pipelining October 22, 1997 Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/ cs 152
More informationImproving Performance: Pipelining!
Iproving Perforance: Pipelining! Meory General registers Meory ID EXE MEM WB Instruction Fetch (includes PC increent) ID Instruction Decode + fetching values fro general purpose registers EXE EXEcute arithetic/logic
More informationENGN1640: Design of Computing Systems Topic 05: Pipeline Processor Design
ENGN64: Design of Computing Systems Topic 5: Pipeline Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University
More informationOut-of-order Pipeline. Register Read. OOO execution (2-wide) OOO execution (2-wide) OOO execution (2-wide) OOO execution (2-wide)
Out-of-order Pipeline Register Read When do instructions read the register file? Fetch Decode Rename Dispatch Buffer of instructions Issue Reg-read Execute Writeback Commit Option #: after select, right
More informationPIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS
PIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 1 submission
More informationComputer Architecture 计算机体系结构. Lecture 3. Instruction-Level Parallelism I 第三讲 指令级并行 I. Chao Li, PhD. 李超博士
Computer Architecture 计算机体系结构 Lecture 3. Instruction-Level Parallelism I 第三讲 指令级并行 I Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2018 Review ISA, micro-architecture, physical design Evolution of ISA CISC vs
More informationPipelined MIPS Datapath with Control Signals
uction ess uction Rs [:26] (Opcode[5:]) [5:] ranch luor. Decoder Pipelined MIPS path with Signals luor Raddr at Five instruction sequence to be processed by pipeline: op [:26] rs [25:2] rt [2:6] rd [5:]
More informationM2 Instruction Set Architecture
M2 Instruction Set Architecture Module Outline Addressing modes. Instruction classes. MIPS-I ISA. High level languages, Assembly languages and object code. Translating and starting a program. Subroutine
More informationCIS 662: Sample midterm w solutions
CIS 662: Sample midterm w solutions 1. (40 points) A processor has the following stages in its pipeline: IF ID ALU1 MEM1 MEM2 ALU2 WB. ALU1 stage is used for effective address calculation for loads, stores
More informationAnnouncements. Programming assignment #2 due Monday 9/24. Talk: Architectural Acceleration of Real Time Physics Glenn Reinman, UCLA CS
Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar GAS STATION Pipelining II Fall 2007 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin,
More information6.823 Computer System Architecture Prerequisite Self-Assessment Test Assigned Feb. 6, 2019 Due Feb 11, 2019
6.823 Computer System Architecture Prerequisite Self-Assessment Test Assigned Feb. 6, 2019 Due Feb 11, 2019 http://csg.csail.mit.edu/6.823/ This self-assessment test is intended to help you determine your
More informationCIS 371 Computer Organization and Design
CIS 371 Computer Organization and Design Unit 10: Static & Dynamic Scheduling Slides developed by Milo Martin & Amir Roth at the University of Pennsylvania with sources that included University of Wisconsin
More informationComputer Architecture ELE 475 / COS 475 Slide Deck 6: Superscalar 3. David Wentzlaff Department of Electrical Engineering Princeton University
Computer Architecture ELE 475 / COS 475 Slide Deck 6: Superscalar 3 David Wentzlaff Department of Electrical Engineering Princeton University 1 Agenda SpeculaJon and Branches Register Renaming Memory DisambiguaJon
More informationUnit 9: Static & Dynamic Scheduling
CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed by Drew Hilton, Amir Roth and Milo Mar;n at University of Pennsylvania CIS 501: Comp. Arch. Prof. Milo Martin
More informationCOSC 6385 Computer Architecture. - Tomasulos Algorithm
COSC 6385 Computer Architecture - Tomasulos Algorithm Fall 2008 Analyzing a short code-sequence DIV.D F0, F2, F4 ADD.D F6, F0, F8 S.D F6, 0(R1) SUB.D F8, F10, F14 MUL.D F6, F10, F8 1 Analyzing a short
More informationCode Scheduling & Limitations
This Unit: Static & Dynamic Scheduling CIS 371 Computer Organization and Design Unit 11: Static and Dynamic Scheduling App App App System software Mem CPU I/O Code scheduling To reduce pipeline stalls
More informationCIS 371 Computer Organization and Design
CIS 371 Computer Organization and Design Unit 10: Static & Dynamic Scheduling Slides developed by M. Martin, A.Roth, C.J. Taylor and Benedict Brown at the University of Pennsylvania with sources that included
More informationCMU Introduction to Computer Architecture, Spring 2013 HW 3 Solutions: Microprogramming Wrap-up and Pipelining
CMU 18-447 Introduction to Computer Architecture, Spring 2013 HW 3 Solutions: Microprogramming Wrap-up and Pipelining Instructor: Prof. Onur Mutlu TAs: Justin Meza, Yoongu Kim, Jason Lin 1 Adding the REP
More informationChapter 3: Computer Organization Fundamentals. Oregon State University School of Electrical Engineering and Computer Science.
Chapter 3: Computer Organization Fundamentals Prof. Ben Lee Oregon State University School of Electrical Engineering and Computer Science Chapter Goals Understand the organization of a computer system
More informationParallelism I: Inside the Core
Parallelism I: Inside the Core 1 The final Comprehensive Same general format as the Midterm. Review the homeworks, the slides, and the quizzes. 2 Key Points What is wide issue mean? How does does it affect
More informationDAT105: Computer Architecture Study Period 2, 2009 Exercise 2 Chapter 2: Instruction-Level Parallelism and Its Exploitation
Study Period 2, 29 Exercise 2 Chapter 2: Instruction-Level Parallelism and Its Exploitation Mafijul Islam Department of Computer Science and Engineering November 12, 29 Study Period 2, 29 Goals: To understand
More informationComputer Architecture: Out-of-Order Execution. Prof. Onur Mutlu (editted by Seth) Carnegie Mellon University
Computer Architecture: Out-of-Order Execution Prof. Onur Mutlu (editted by Seth) Carnegie Mellon University Reading for Today Smith and Sohi, The Microarchitecture of Superscalar Processors, Proceedings
More informationTomasulo-Style Register Renaming
Tomasulo-Style Register Renaming ldf f0,x(r1) allocate RS#4 map f0 to RS#4 mulf f4,f0, allocate RS#6 ready, copy value f0 not ready, copy tag Map Table f0 f4 RS#4 RS T V1 V2 T1 T2 4 REG[r1] 6 REG[] RS#4
More informationAdvanced Superscalar Architectures
Advanced Suerscalar Architectures Krste Asanovic Laboratory for Comuter Science Massachusetts Institute of Technology Physical Register Renaming (single hysical register file: MIPS R10K, Alha 21264, Pentium-4)
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 23 Synchronization 2006-11-16 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ 1 Last Time:
More informationFabComp: Hardware specication
Sol Boucher and Evan Klei CSCI-453-01 04/28/14 FabComp: Hardware specication 1 Hardware The computer is composed of a largely isolated data unit and control unit, which are only connected by a couple of
More informationUC Berkeley CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 20 Synchronous Digital Systems Blu-ray vs HD-DVD war over? As you know, there are two different, competing formats for the next
More informationCS 152 Computer Architecture and Engineering. Lecture 15 - Advanced Superscalars
CS 152 Comuter Architecture and Engineering Lecture 15 - Advanced Suerscalars Krste Asanovic Electrical Engineering and Comuter Sciences University of California at Berkeley htt://www.eecs.berkeley.edu/~krste
More informationTo read more. CS 6354: Tomasulo. Intel Skylake. Scheduling. How can we reorder instructions? Without changing the answer.
To read more CS 6354: Tomasulo 21 September 2016 This day s paper: Tomasulo, An Efficient Algorithm for Exploiting Multiple Arithmetic Units Supplementary readings: Hennessy and Patterson, Computer Architecture:
More informationCS 6354: Tomasulo. 21 September 2016
1 CS 6354: Tomasulo 21 September 2016 To read more 1 This day s paper: Tomasulo, An Efficient Algorithm for Exploiting Multiple Arithmetic Units Supplementary readings: Hennessy and Patterson, Computer
More informationLecture 20: Parallelism ILP to Multicores. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 20: Parallelism ILP to Multicores James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L20 S1, James C. Hoe, CMU/ECE/CALCM, 2018 18 447 S18 L20 S2, James C. Hoe, CMU/ECE/CALCM,
More informationCS 152 Computer Architecture and Engineering. Lecture 14 - Advanced Superscalars
CS 152 Comuter Architecture and Engineering Lecture 14 - Advanced Suerscalars Krste Asanovic Electrical Engineering and Comuter Sciences University of California at Berkeley htt://www.eecs.berkeley.edu/~krste
More informationVHDL (and verilog) allow complex hardware to be described in either single-segment style to two-segment style
FFs and Registers In this lecture, we show how the process block is used to create FFs and registers Flip-flops (FFs) and registers are both derived using our standard data types, std_logic, std_logic_vector,
More informationSDRAM AS4SD8M Mb: 8 Meg x 16 SDRAM Synchronous DRAM Memory. PIN ASSIGNMENT (Top View)
128 Mb: 8 Meg x 16 SDRAM Synchronous DRAM Memory FEATURES Full Military temp (-55 C to 125 C) processing available Configuration: 8 Meg x 16 (2 Meg x 16 x 4 banks) Fully synchronous; all signals registered
More informationDecoupling Loads for Nano-Instruction Set Computers
Decoupling Loads for Nano-Instruction Set Computers Ziqiang (Patrick) Huang, Andrew Hilton, Benjamin Lee Duke University {ziqiang.huang, andrew.hilton, benjamin.c.lee}@duke.edu ISCA-43, June 21, 2016 1
More informationDirect-Mapped Cache Terminology. Caching Terminology. TIO Dan s great cache mnemonic. UCB CS61C : Machine Structures
Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 31 Caches II 2008-04-12 HP has begun testing research prototypes of a novel non-volatile memory element, the
More informationECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017
ECE 550D Fundamentals of Computer Systems and Engineering Fall 2017 Digital Arithmetic Prof. John Board Duke University Slides are derived from work by Profs. Tyler Bletch and Andrew Hilton (Duke) Last
More informationIS42S32200C1. 512K Bits x 32 Bits x 4 Banks (64-MBIT) SYNCHRONOUS DYNAMIC RAM
512K Bits x 32 Bits x 4 Banks (64-MBIT) SYNCHRONOUS DYNAMIC RAM JANUARY 2007 FEATURES Clock frequency: 183, 166, 143 MHz Fully synchronous; all signals referenced to a positive clock edge Internal bank
More informationCSCI 510: Computer Architecture Written Assignment 2 Solutions
CSCI 510: Computer Architecture Written Assignment 2 Solutions The following code does compution over two vectors. Consider different execution scenarios and provide the average number of cycles per iterion
More informationBimotion Advanced Port & Pipe Case study A step by step guide about how to calculate a 2-stroke engine.
Bimotion Advanced Port & Pipe Case study A step by step guide about how to calculate a 2-stroke engine. 2009/aug/21. Bimotion. This paper is free for distribution and may be revised, for further references
More informationIS42S32200L IS45S32200L
IS42S32200L IS45S32200L 512K Bits x 32 Bits x 4 Banks (64-MBIT) SYNCHRONOUS DYNAMIC RAM OCTOBER 2012 FEATURES Clock frequency: 200, 166, 143, 133 MHz Fully synchronous; all signals referenced to a positive
More informationECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 10 Instruction-Level Parallelism Part 3
ECE 552 / CPS 550 Advanced Comuter Architecture I Lecture 10 Instruction-Level Parallelism Part 3 Benjamin Lee Electrical and Comuter Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html
More informationChapter 10 And, Finally... The Stack
Chapter 10 And, Finally... The Stack Stacks: An Abstract Data Type A LIFO (last-in first-out) storage structure. The first thing you put in is the last thing you take out. The last thing you put in is
More informationSYNCHRONOUS DRAM. 128Mb: x32 SDRAM. MT48LC4M32B2-1 Meg x 32 x 4 banks
SYNCHRONOUS DRAM 128Mb: x32 MT48LC4M32B2-1 Meg x 32 x 4 banks For the latest data sheet, please refer to the Micron Web site: www.micron.com/sdramds FEATURES PC100 functionality Fully synchronous; all
More information128Mb Synchronous DRAM. Features High Performance: Description. REV 1.0 May, 2001 NT5SV32M4CT NT5SV16M8CT NT5SV8M16CT
Features High Performance: f Clock Frequency -7K 3 CL=2-75B, CL=3-8B, CL=2 Single Pulsed RAS Interface Fully Synchronous to Positive Clock Edge Four Banks controlled by BS0/BS1 (Bank Select) Units 133
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 02
More informationRAM-Type Interface for Embedded User Flash Memory
June 2012 Introduction Reference Design RD1126 MachXO2-640/U and higher density devices provide a User Flash Memory (UFM) block, which can be used for a variety of applications including PROM data storage,
More informationCS 250! VLSI System Design
CS 250! VLSI System Design Lecture 3 Timing 2014-9-4! Professor Jonathan Bachrach! slides by John Lazzaro TA: Colin Schmidt www-insteecsberkeleyedu/~cs250/ UC Regents Fall 2013/1014 UCB everything doesn
More informationOptimality of Tomasulo s Algorithm Luna, Dong Gang, Zhao
Optimality of Tomasulo s Algorithm Luna, Dong Gang, Zhao Feb 28th, 2002 Our Questions about Tomasulo Questions about Tomasulo s Algorithm Is it optimal (can always produce the wisest instruction execution
More informationHYB25D256400/800AT 256-MBit Double Data Rata SDRAM
256-MBit Double Data Rata SDRAM Features CAS Latency and Frequency Maximum Operating Frequency (MHz) CAS Latency DDR266A -7 DDR200-8 2 133 100 2.5 143 125 Double data rate architecture: two data transfers
More information- - DQ0 NC DQ1 DQ0 DQ2 - NC DQ1 DQ3 NC - NC
SYNCHRONOUS DRAM 64Mb: x4, x8, x16 MT48LC16M4A2 4 Meg x 4 x 4 banks MT48LC8M8A2 2 Meg x 8 x 4 banks MT48LC4M16A2 1 Meg x 16 x 4 banks For the latest data sheet, please refer to the Micron Web site: www.micron.com/mti/msp/html/datasheet.html
More informationCMPEN 411 VLSI Digital Circuits Spring Lecture 24: Peripheral Memory Circuits
CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 24: Peripheral Memory Circuits [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp12
More information- DQ0 - NC DQ1 - NC - NC DQ0 - NC DQ2 DQ1 DQ CONFIGURATION. None SPEED GRADE
SYNCHRONOUS DRAM 52Mb: x4, x8, x6 MT48LC28M4A2 32 MEG x 4 x 4 S MT48LC64M8A2 6 MEG x 8 x 4 S MT48LC32M6A2 8 MEG x 6 x 4 S For the latest data sheet, please refer to the Micron Web site: www.micron.com/dramds
More information128Mb DDR SDRAM. Features. Description. REV 1.1 Oct, 2006
Features Double data rate architecture: two data transfers per clock cycle Bidirectional data strobe () is transmitted and received with data, to be used in capturing data at the receiver is edge-aligned
More information- DQ0 - NC DQ1 - NC - NC DQ0 - NC DQ2 DQ1 DQ
SYNCHRONOUS DRAM ADVANCE MT48LC28M4A2 32 Meg x 4 x 4 banks MT48LC64M8A2 6 Meg x 8 x 4 banks MT48LC32M6A2 8 Meg x 6 x 4 banks For the latest data sheet, please refer to the Micron Web site: www.micron.com/dramds
More informationHYB25D256[400/800/160]B[T/C](L) 256-Mbit Double Data Rate SDRAM, Die Rev. B Data Sheet Jan. 2003, V1.1. Features. Description
Data Sheet Jan. 2003, V1.1 Features CAS Latency and Frequency Maximum Operating Frequency (MHz) CAS Latency DDR200-8 DDR266A -7 DDR266-7F DDR333-6 2 100 133 133 133 2.5 125 143 143 166 Double data rate
More informationCprE 281: Digital Logic
CprE 28: Digital Logic Instructor: Alexander Stoytchev http://www.ece.iastate.edu/~alexs/classes/ Registers and Counters CprE 28: Digital Logic Iowa State University, Ames, IA Copyright Alexander Stoytchev
More informationV 2.0. Version 9 PC. Setup Guide. Revised:
V 2.0 Version 9 PC Setup Guide Revised: 06-12-00 Digital 328 v2 and Cakewalk Version 9 PC Contents 1 Introduction 2 2 Configuring Cakewalk 4 3 328 Instrument Definition 6 4 328 Automation Setup 8 5 Automation
More informationDeveloping PMs for Hydraulic System
Developing PMs for Hydraulic System Focus on failure prevention rather than troubleshooting. Here are some best practices you can use to upgrade your preventive maintenance procedures for hydraulic systems.
More information- - DQ0 NC DQ1 DQ0 DQ2 - NC DQ1 DQ3 NC - NC
SYHRONOUS DRAM 128Mb: x4, x8, x16 MT48LC32M4A2 8 Meg x 4 x 4 banks MT48LC16M8A2 4 Meg x 8 x 4 banks MT48LC8M16A2 2 Meg x 16 x 4 banks For the latest data sheet, please refer to the Micron Web site: www.micron.com/dramds
More informationProposed Solution to Mitigate Concerns Regarding AC Power Flow under Convergence Bidding. September 25, 2009
Proposed Solution to Mitigate Concerns Regarding AC Power Flow under Convergence Bidding September 25, 2009 Proposed Solution to Mitigate Concerns Regarding AC Power Flow under Convergence Bidding Background
More informationChapter 5 Vehicle Operation Basics
Chapter 5 Vehicle Operation Basics 5-1 STARTING THE ENGINE AND ENGAGING THE TRANSMISSION A. In the spaces provided, identify each of the following gears. AUTOMATIC TRANSMISSION B. Indicate the word or
More information(FPGA) based design for minimizing petrol spill from the pipe lines during sabotage
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 05, Issue 01 (January. 2015), V3 PP 26-30 www.iosrjen.org (FPGA) based design for minimizing petrol spill from the pipe
More informationLecture 31 Caches II TIO Dan s great cache mnemonic. Issues with Direct-Mapped
CS61C L31 Caches II (1) inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 31 Caches II 26-11-13 Lecturer SOE Dan Garcia www.cs.berkeley.edu/~ddgarcia GPUs >> CPUs? Many are using
More informationHYB25D256400B[T/C](L) HYB25D256800B[T/C](L) HYB25D256160B[T/C](L)
Data Sheet, Rev. 1.21, Jul. 2004 HYB25D256400B[T/C](L) HYB25D256800B[T/C](L) HYB25D256160B[T/C](L) 256 Mbit Double Data Rate SDRAM DDR SDRAM Memory Products N e v e r s t o p t h i n k i n g. Edition 2004-07
More informationInstruction of connection and programming of the VECTOR controller
Instruction of connection and programming of the VECTOR controller 1. Connection of wiring 1.1.VECTOR Connection diagram Fig. 1 VECTOR Diagram of connection to the vehicle wiring. 1.2.Connection of wiring
More informationNickel Cadmium and Nickel Hydride Battery Charging Applications Using the HT48R062
ickel Cadmium and ickel Hydride Battery Charging Applications Using the HT48R062 ickel Cadmium and ickel Hydride Battery Charging Applications Using the HT48R062 D/: HA0126E Introduction This application
More informationABB June 19, Slide 1
Dr Simon Round, Head of Technology Management, MATLAB Conference 2015, Bern Switzerland, 9 June 2015 A Decade of Efficiency Gains Leveraging modern development methods and the rising computational performance-price
More informationCMPEN 411 VLSI Digital Circuits Spring Lecture 22: Memery, ROM
CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 22: Memery, ROM [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp12 CMPEN 411 L22 S.1
More informationA48P4616B. 16M X 16 Bit DDR DRAM. Document Title 16M X 16 Bit DDR DRAM. Revision History. AMIC Technology, Corp. Rev. No. History Issue Date Remark
16M X 16 Bit DDR DRAM Document Title 16M X 16 Bit DDR DRAM Revision History Rev. No. History Issue Date Remark 1.0 Initial issue January 9, 2014 Final (January, 2014, Version 1.0) AMIC Technology, Corp.
More informationIntroducing. chip and PIN
Introducing chip and PIN PIN not pen The way that we pay for things with credit and debit cards is changing. By 2005, most of us will be using a smart, new system in the UK called chip and PIN which will
More informationRR Concepts. The StationMaster can control DC trains or DCC equipped trains set to linear mode.
Jan, 0 S RR Concepts M tation aster - 5 Train Controller - V software This manual contains detailed hookup and programming instructions for the StationMaster train controller available in a AMP or 0AMP
More informationThe RCS-6V kit. Page of Contents. 1. This Book 1.1. Warning & safety What can I do with the RCS-kit? Tips 3
The RCS-6V kit Page of Contents Page 1. This Book 1.1. Warning & safety 3 1.2. What can I do with the RCS-kit? 3 1.3. Tips 3 2. The principle of the system 2.1. How the load measurement system works 5
More informationArduino-based OBD-II Interface and Data Logger. CS 497 Independent Study Ryan Miller Advisor: Prof. Douglas Comer April 26, 2011
Arduino-based OBD-II Interface and Data Logger CS 497 Independent Study Ryan Miller Advisor: Prof. Douglas Comer April 26, 2011 Arduino Hardware Automotive OBD ISO Interface Software Arduino Italy 2005
More informationSeries circuits. The ammeter
Series circuits D o you remember how the parts of the torch on pages 272 3 were connected together? The circuit contained several components, connected one after the other. Conductors, like the metal strip
More informationIn-Place Associative Computing:
In-Place Associative Computing: A New Concept in Processor Design 1 Page Abstract 3 What s Wrong with Existing Processors? 3 Introducing the Associative Processing Unit 5 The APU Edge 5 Overview of APU
More informationEECS 583 Class 9 Classic Optimization
EECS 583 Class 9 Classic Optimization University of Michigan September 28, 2016 Generalizing Dataflow Analysis Transfer function» How information is changed by something (BB)» OUT = GEN + (IN KILL) /*
More informationFast Orbit Feedback (FOFB) at Diamond
Fast Orbit Feedback (FOFB) at Diamond Guenther Rehm, Head of Diagnostics Group 29/06/2007 FOFB at Diamond 1 Ground, Girder and Beam Motion 29/06/2007 FOFB at Diamond 2 Fast Feedback Design Philosophy Low
More informationAVS64( )L
AVS640416.1604.0808L 64 Mb Synchronous DRAM 16 Mb x 4 0416 8 Mb x 8 0808 4 Mb x 161604 Features PC100/PC133/PC143/PC166compliant Fully synchronous; all signals registered on positive edge of system clock
More informationWarped-Compression: Enabling Power Efficient GPUs through Register Compression
WarpedCompression: Enabling Power Efficient GPUs through Register Compression Sangpil Lee, Keunsoo Kim, Won Woo Ro (Yonsei University*) Gunjae Koo, Hyeran Jeon, Murali Annavaram (USC) (*Work done while
More informationindex changing a variable s value, Chime My Block, clearing the screen. See Display block CoastBack program, 54 44
index A absolute value, 103, 159 adding labels to a displayed value, 108 109 adding a Sequence Beam to a Loop of Switch block, 223 228 algorithm, defined, 86 ambient light, measuring, 63 analyzing data,
More informationPMD709408C/PMD709416C. Document Title. Revision History. 512Mb (64M x 8 / 32M x 16) DDR SDRAM C die Datasheet
Document Title 512Mb (64M x 8 / 32M x 16) DDR SDRAM C die Datasheet Revision History Revision Date Page Notes 0.1 October, 2013 Preliminary 1.0 March, 2014 Official release 1.1 April, 2014 500Mbps speed
More informationOPERATING MANUAL Digital Diesel Control Remote control panel for WhisperPower generator sets
Art. nr. 40200261 OPERATING MANUAL Digital Diesel Control Remote control panel for WhisperPower generator sets WHISPERPOWER BV Kelvinlaan 82 9207 JB Drachten Netherlands Tel.: +31-512-571550 Fax.: +31-512-571599
More informationIS42S Meg Bits x 16 Bits x 4 Banks (64-MBIT) SYNCHRONOUS DYNAMIC RAM FEATURES OVERVIEW. PIN CONFIGURATIONS 54-Pin TSOP (Type II)
1 Meg Bits x 16 Bits x 4 Banks (64-MBIT) SYNCHRONOUS DYNAMIC RAM JANUARY 2008 FEATURES Clock frequency: 166, 143 MHz Fully synchronous; all signals referenced to a positive clock edge Internal bank for
More informationMulti Core Processing in VisionLab
Multi Core Processing in Multi Core CPU Processing in 25 August 2014 Copyright 2001 2014 by Van de Loosdrecht Machine Vision BV All rights reserved jaap@vdlmv.nl Overview Introduction Demonstration Automatic
More informationELM327 OBD to RS232 Interpreter
OBD to RS232 Interpreter Description Almost all new automobiles produced today are required, by law, to provide an interface from which test equipment can obtain diagnostic information. The data transfer
More informationSTPA based Method to Identify and Control Software Feature Interactions. John Thomas Dajiang Suo
STPA based Method to Identify and Control Software Feature Interactions John Thomas Dajiang Suo Quote The hardest single part of building a software system is deciding precisely what to build. -- Fred
More informationScheduling. Purpose of scheduling. Scheduling. Scheduling. Concurrent & Distributed Systems Purpose of scheduling.
427 Concurrent & Distributed Systems 2017 6 Uwe R. Zimmer - The Australian National University 429 Motivation and definition of terms Purpose of scheduling 2017 Uwe R. Zimmer, The Australian National University
More informationFrequently Asked Questions New Tagging Requirements
Frequently Asked Questions New Tagging Requirements Q: Are there new E-tagging requirements related to the new fifteen minute market FERC Order No. 764 fifteen minute scheduling implemented on May 1, 2014?
More informationIntroducing Formal Methods (with an example)
Introducing Formal Methods (with an example) J-R. Abrial September 2004 Formal Methods: a Great Confusion - What are they used for? - When are they to be used? - Is UML a formal method? - Are they needed
More informationPMD706416A. Document Title. 64Mb (4M x 16) DDR SDRAM (A die) Datasheet
Document Title 64Mb (4M x 16) DDR SDRAM (A die) Datasheet This document is a general product description and subject to change without notice. 64MBIT DDR DRAM Features JEDEC DDR Compliant Differential
More informationFlexJet Carriage Circuit Board (PCB) Replacement
P/N: 111484 R0 14140 NE 200th St. Woodinville, WA. 98072 PH: (425) 398-8282 FX: (425) 398-8383 ioline.com FlexJet Carriage Circuit Board (PCB) Replacement Notices: Warning! Ensure that all AC power cables
More information3200NT Timer Service Manual
Service Manual Valve Serial Number Valve Position 1-LEAd 2-LAg 3-LAg 4-LAg IMPORTANT: Fill in pertinent information on page 3 for future reference. Table of Contents Job Specifications Sheet.....................................................................
More informationSDRAM DEVICE OPERATION
POWER UP SEQUENCE SDRAM must be initialized with the proper power-up sequence to the following (JEDEC Standard 21C 3.11.5.4): 1. Apply power and start clock. Attempt to maintain a NOP condition at the
More information