ENGN1640: Design of Computing Systems Topic 05: Pipeline Processor Design

Size: px
Start display at page:

Download "ENGN1640: Design of Computing Systems Topic 05: Pipeline Processor Design"

Transcription

1 ENGN64: Design of Computing Systems Topic 5: Pipeline Processor Design Professor Sherief Reda Electrical Sciences and Computer Engineering School of Engineering Brown University Spring 26 [ material from Patterson & Hennessy and Harris]

2 Pipelining analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 6/7 = 2.3 Non-stop: Ideal Speedup = 4n/(n + 3) 4 = number of stages 2

3 Pipelined ARM processor Temporal parallelism Divide single-cycle processor into 5 stages: Fetch Decode Execute Writeback Add pipeline registers between stages 3

4 Single-Cycle vs Pipelined Single-Cycle Instr Fetc h Instruction Dec Read Reg Execute ALU Read / Write Wr Reg Fetc h Instruction Dec Read Reg Execute ALU Read / Write Wr Reg Time (ps) Instr 2 3 Fetc h Instruction Dec Read Reg Fetc h Instruction Execute ALU Dec Read Reg Fetc h Instruction Pipelined Read / Write Execute ALU Dec Read Reg Wr Reg Read / Write Execute ALU Wr Reg Read / Write Wr Reg (b) 4

5 Pipeline datapath abstraction Time (cycles) LDR LDR R2, [R, #4] 4 R + DM R2 ADD R3, R9, R ADD R9 R + DM R3 SUB R4, R, R5 SUB R R5 - DM R4 AND R5, R2, R3 AND R2 R3 & DM R5 STR R6, [R, #2] STR R 2 + DM R6 ORR R7, R, #42 ORR R 42 DM R7 5

6 Single-cycle & pipelined datapath Single-Cycle PC' PC 4 A RD Instructi on + PCPlus4 Instr 9:6 5 3: 5:2 4 + RA RA2 PCPlus8 A A2 A3 WD3 R5 WE3 Register File RD RD2 SrcA SrcB ALU ALUResult WriteData WE A RD Data WD ReadData PC' PCF PC' PCF A RD A RD Instructi on Instructi on PCPlus4F PCPlus4F + 23: Extend WE3 RAD RAD A WE3 RD 5 A RD 5 3: RA2D RA2D A2 RD2 A2 RD2 5:2 WA3D A3 Register A3 WD3Register File 4 PCPlus8 PCPlus8 WD3 File 4 R5 R5 Ext Imm InstrF InstrF InstrD InstrD 9:6 9:6 3: 5:2 23: 23: + + WA3D Extend Extend Pipelined Ext ImmE Ext ImmE SrcAE SrcAE SrcBE SrcBE Fetch Decode Execute Writeback WA3 must arrive at same time as Result Register file written on falling edge of ALU ALU Result W Result WE WE ALUResultE ReadDataW ALUResultE A RD ReadDat aw A RD Data Data WriteDataE WriteDataE WD WD ALUOutM ALUOutW ALUOutM ALUOutW WA3E WA3M WA3W Result W 6

7 Optimized pipeline datapath PC' PCF 4 A RD Instructi on + InstrF PCPlus4F InstrD 9:6 3: 5 4 5:2 PCPlus8 + RAD RA2D A A2 A3 WD3 R5 WA3D WE3 Register File RD RD2 SrcAE SrcBE ALU ALUResultE WriteDataE WE A RD Data WD ReadDataW ALUOutM ALUOutW WA3E WA3M WA3W 23: Extend Ext ImmE PC' PCF 4 A RD Instructi on + InstrF PCPlus4F InstrD 9:6 5 3: 5:2 RAD RA2D A A2 A3 WD3 R5 WA3D WE3 Register File RD RD2 SrcAE SrcBE ALU ALUResultE WriteDataE WE A RD Data WD ReadDataW ALUOutM ALUOutW WA3E WA3M WA3W Result W 23: PCPlus8D Extend Ext ImmE Result W Remove adder by using PCPlus4F after PC has been updated to PC+4 Assumes writing happens (e.g., in first half of clock cycle) before reading 7

8 Tracing LDR in its journey: st cycle LDR fetch PC' PCF 4 A RD Instructi on + InstrF PCPlus4F InstrD 9:6 5 3: 5:2 RAD RA2D A A2 A3 WD3 R5 WA3D WE3 Register File RD RD2 SrcAE SrcBE ALU ALUResultE WriteDataE WE A RD Data WD ALUOutM ReadDataW ALUOutW WA3E WA3M WA3W 23: PCPlus8D Extend Ext ImmE Result W 8

9 Tracing LDR in its journey: 2 nd cycle LDR decode PC' PCF 4 A RD Instructi on + InstrF PCPlus4F InstrD 9:6 5 3: 5:2 RAD RA2D A A2 A3 WD3 R5 WA3D WE3 Register File RD RD2 SrcAE SrcBE ALU ALUResultE WriteDataE WE A RD Data WD ALUOutM ReadDataW ALUOutW WA3E WA3M WA3W 23: PCPlus8D Extend Ext ImmE Result W 9

10 Tracing LDR in its journey: 3 rd cycle LDR EXE PC' PCF 4 A RD Instructi on + InstrF PCPlus4F InstrD 9:6 5 3: 5:2 RAD RA2D A A2 A3 WD3 R5 WA3D WE3 Register File RD RD2 SrcAE SrcBE ALU ALUResultE WriteDataE WE A RD Data WD ALUOutM ReadDataW ALUOutW WA3E WA3M WA3W 23: PCPlus8D Extend Ext ImmE Result W

11 Tracing LDR in its journey: 4 th cycle LDR mem PC' PCF 4 A RD Instructi on + InstrF PCPlus4F InstrD 9:6 5 3: 5:2 RAD RA2D A A2 A3 WD3 R5 WA3D WE3 Register File RD RD2 SrcAE SrcBE ALU ALUResultE WriteDataE WE A RD Data WD ALUOutM ReadDataW ALUOutW WA3E WA3M WA3W 23: PCPlus8D Extend Ext ImmE Result W

12 Tracing LDR in its journey: 5 th cycle LDR WB LDR WB PC' PCF 4 A RD Instructi on + InstrF PCPlus4F InstrD 9:6 5 3: 5:2 RAD RA2D A A2 A3 WD3 R5 WA3D WE3 Register File RD RD2 SrcAE SrcBE ALU ALUResultE WriteDataE WE A RD Data WD ALUOutM ReadDataW ALUOutW WA3E WA3M WA3W 23: PCPlus8D Extend Ext ImmE Result W 2

13 Pipeline performance Assume time for stages is ps for register read or write 2ps for other stages Compare pipelined datapath with single-cycle datapath Instr Instr fetch Register read ALU op access Register write Total time LDR 2ps ps 2ps 2ps ps 8ps STR 2ps ps 2ps 2ps 7ps data op 2ps ps 2ps ps 6ps Branch 2ps ps 2ps 5ps 3

14 Single-cycle versus pipeline performance Single-cycle (T c = 8ps) LDR R,[R5] LDR R2, [R6] LDR R3, [R7] Pipelined (T c = 2ps) LDR R,[R5] LDR R2, [R6] LDR R3, [R7] 4

15 Pipeline speedup If all stages are balanced i.e., all take the same time Time between instructions pipelined = Time between instructions nonpipelined Number of stages Ideal speedup (n instructions and s stages) If not balanced, speedup is less Speedup due to increased throughput Latency (time for each instruction) does not decrease Branches will also reduce the speedup Added pipeline registers reduce the speedup 5

16 Reminder of single-cycle control 6 Ext Imm A RD Instructi on + 4 A A3 WD3 RD2 RD WE3 A2 Register File A RD Data WD WE PC PC' Instr 9:6 5:2 23: 25:2 SrcB ALUResult ReadDat a WriteData SrcA PCPlus4 Result 27:26 ImmSrc PCSrc MemWrite MemtoReg ALUSrc RegWrite Op Funct Control Unit ALUFlags ALUControl ALU PCPlus8 R5 3: Cond 3:28 Flags 5:2 Rd RA RA2 Extend RegSrc

17 Modifications to pipeline control Control signals derived from instruction Same as in single-cycle implementation Control delayed to proper pipeline stage 7

18 Pipelined datapath + control PC' PCF 4 A RD Instructi on + InstrF PCPlus4F InstrD 27:26 25:2 5:2 3:28 9:6 3: 5:2 5 Control Unit Op Funct Rd RegSrcD RAD RA2D PCSrcD PCSrcE PCSrcM PCSrcW RegWriteD RegWriteE RegWriteM RegWriteW MemtoRegD MemtoRegE MemtoRegM MemtoRegW MemWriteD MemWriteE MemWriteM ALUControlD ALUControlE BranchD ALUSrcD FlagWriteD ImmSrcD A A2 A3 WD3 R5 WE3 RD Register File RD2 BranchE ALUSrcE FlagWriteE CondE FlagsE Ext ImmE SrcAE SrcBE ALU Flags' CondExE Cond Unit ALUFlags ALUResultE WriteDataE A WE RD Data WD ALUOutM ReadDat aw ALUOutW WA3E WA3M WA3W 23: Extend PCPlus8D Result W Same control unit as single-cycle processor Control delayed to proper pipeline stage 8

19 Pipelining hazards Situations that prevent starting the next instruction in the next cycle. Structural hazards A required resource is busy 2. Data hazard Need to wait for previous instruction to complete its data read/write 3. Control hazard Deciding on control action depends on previous instruction 9

20 . Structural hazards Conflict for use of a resource In ARMs pipeline with a single memory Load/store requires data access Instruction fetch would have to stall for that cycle Would cause a pipeline bubble Hence, pipelined datapaths require separate instruction/data memories Or separate instruction/data caches 2

21 2. Data Hazards: compute-use Time (cycles) ADD R, R4, R5 ADD R4 R5 + DM R AND R8, R, R3 AND R R3 & DM R8 ORR R9, R6, R ORR R6 R DM R9 SUB R, R, R7 SUB R R7 - DM R 2

22 Data Hazard: load-use LDR R, [R4, #4] AND R8, R, R3 ORR R9, R6, R SUB R, R, R7 22

23 Handling data hazards A. Compile-time techniques B. Forward data at run time C. Stall the processor at run time 23

24 A. Data hazard elimination using compile-time techniques (nop) Insert enough nops until result is ready (wastes cycles) Time (cycles) ADD R, R4, R5 ADD R4 R5 + DM R NOP NOP DM NOP NOP DM AND R8, R, R3 AND R R3 & DM R8 ORR R9, R6, R ORR R6 R DM R9 SUB R, R, R7 SUB R R7 - DM R 24

25 A. Data hazard elimination using compiletime techniques (code rescheduling) Reorder code to avoid use of load result in the next instruction Compiler must be aware of pipeline structure ADD R, R4, R5 ADD R8, R, R3 AND R9, R6, R SUB R, R2, R7 ADD R, R4, R5 SUB R, R2, R7 NOP ADD R8, R, R3 AND R9, R6, R Rescheduling saved one cycles! 25

26 B. Data hazard elimination using data forwarding/bypassing during runtime Don t wait for result to be stored in a register forward the results whenever the results Requires extra connections in the datapath Time (cycles) ADD R, R4, R5 ADD R4 R5 + DM R AND R8, R, R3 AND R R3 & DM R8 ORR R9, R6, R ORR R6 R DM R9 SUB R, R, R7 SUB R R7 - DM R Check if register read in Execute stage matches register written in or Writeback stage If so, forward result 26

27 Circuitry for forwarding PC' PCF 4 A RD Instructi on + InstrF PCPlus4F InstrD 27:26 25:2 5:2 3:28 9:6 3: 5:2 5 Control Unit Op Funct Rd RegSrcD RAD RA2D PCSrcD PCSrcE PCSrcM PCSrcW RegWriteD RegWriteE RegWriteM RegWriteW MemtoRegD MemtoRegE MemtoRegM MemtoRegW MemWriteD MemWriteE MemWriteM ALUControlD ALUControlE BranchD ALUSrcD FlagWriteD ImmSrcD A A2 A3 WD3 R5 WE3 RD Register File RD2 BranchE ALUSrcE FlagWriteE CondE FlagsE Ext ImmE SrcAE SrcBE ALU Flags' CondExE Cond Unit ALUFlags ALUResultE WriteDataE WE A RD Data WD ALUOutM ReadDataW ALUOutW WA3E WA3M WA3W 23: Extend PCPlus8D Result W RegWriteW RegWriteM Match ForwardBE ForwardAE Hazard Unit 27

28 To forward or not to forward! Execute stage register matches stage register? Match_E_M = (RAE == WA3M) Match_2E_M = (RA2E == WA3M) Execute stage register matches Writeback stage register? Match_E_W = (RAE == WA3W) Match_2E_W = (RA2E == WA3W) If it matches, forward result: if (Match_E_M RegWriteM) ForwardAE = ; else if (Match_E_W RegWriteW) ForwardAE = ; else ForwardAE = ; 28

29 Double data hazard Consider the sequence: ADD R,R,R2 ADD R,R,R3 ADD R,R,R4 Both hazards occur Want to use the most recent Revise MEM hazard condition Give priority to EX results. That is, only fwd from MEM if EX hazard condition isn t true 29

30 Pipelining hazards Structural hazards 2. Data hazard 3. Control hazard Compile-time techniques Forward data at run time C. Stall the processor at run time 3

31 Stalling Time (cycles) LDR LDR R, [R4, #4] 4 R4 + DM R AND R8, R, R3 AND Trouble! R R3 & DM R8 ORR R9, R6, R ORR R6 R DM R9 SUB R, R, R7 SUB R R7 - DM R 3

32 Forwarding is not going to eliminate all hazards Time (cycles) LDR LDR R, [R4, #4] 4 R4 + DM R AND R8, R, R3 AND R R3 R R3 & DM R8 ORR R9, R6, R ORR ORR R6 R DM R9 SUB R, R, R7 Stall SUB R R7 - DM R 32

33 FIX C. Data hazard elimination by stalling clock cycle wasted necessary for correctness stall inserted here 33

34 Stalling HW 34 Ext ImmE A RD Instructi on + 4 A A3 WD3 RD2 RD WE3 A2 Register File A RD Data WD WE PCF PC' InstrD 9:6 5:2 23: 25:2 SrcBE ALUResultE ReadDat aw WriteDataE SrcAE PCPlus4F Result W 27:26 ImmSrcD MemWriteD MemtoRegD ALUSrcD RegWriteD Op Funct Control Unit ALUFlags ALUControlD ALU PCPlus8D R5 3: 3:28 FlagWriteD 5:2 Rd 5 RAD RA2D Extend RegSrcD InstrF ALUOutM ALUOutW WA3E WA3M WA3W MemWriteE MemtoRegE ALUSrcE RegWriteE ALUControlE MemWriteM MemtoRegM RegWriteM MemtoRegW RegWriteW BranchD FlagsE FlagWriteE BranchE CondE CondExE PCSrcD PCSrcE PCSrcM PCSrcW Flags' Cond Unit Hazard Unit ForwardAE ForwardBE RegWriteM Match RegWriteW MemtoRegE StallF StallD FlushE EN CLR CLR EN FlushD

35 To stall or not to stall! Is either source register in the Decode stage the same as the one being written in the Execute stage? Match_2D_E = (RAD == WA3E) (RA2D == WA3E) Is a LDR in the Execute stage AND Match_2D_E? ldrstall = Match_2D_E MemtoRegE StallF = StallD = FlushE = ldrstall 35

36 Data hazard summary Compiler can arrange code to avoid hazards and stalls. Requires knowledge of the pipeline structure Forwarding can sometimes avoids stalls at the expense of extra hardware complexity Stalls reduce performance by increasing the average cycles per instruction (CPI). But sometimes are absolutely necessary to get correct results 36

37 3. Control hazards B: branch not determined until the Writeback stage of pipeline Instructions after branch fetched before branch occurs These 4 instructions must be flushed if branch happens Writes to PC (R5) similar 37

38 Control hazards Time (cycles) 2 B B 3C DM 24 AND R8, R, R3 AND R R3 & DM 28 2C ORR R9, R6, R SUB R, R, R7 ORR R6 R SUB DM R R7 - DM Flush these instructions 3 SUB R, R, R8 SUB R R8 - DM ADD R2, R3, R4 R4 ADD R3 + DM R2 Branch misprediction penalty number of instruction flushed when branch is taken (4) May be reduced by determining BTA earlier 38

39 Early branch resolution Determine BTA in Execute stage Branch misprediction penalty = 2 cycles Hardware changes Add a branch multiplexer before PC register to select BTA from ALUResultE Add BranchTakenE select signal for this multiplexer (only asserted if branch condition satisfied) PCSrcW now only asserted for writes to PC 39

40 Pipelined processor with early BTA PC' EN PCF 4 A RD Instructi on + InstrF PCPlus4F InstrD 27:26 25:2 5:2 3:28 9:6 3: 5:2 5 Control Unit Op Funct Rd RegSrcD RAD RA2D PCSrcD PCSrcE PCSrcM PCSrcW RegWriteD RegWriteE RegWriteM RegWriteW MemtoRegD MemtoRegE MemtoRegM MemtoRegW MemWriteD MemWriteE MemWriteM ALUControlD ALUControlE BranchD ALUSrcD FlagWriteD ImmSrcD A A2 A3 WD3 R5 WE3 RD Register File RD2 BranchE ALUSrcE FlagWriteE CondE FlagsE Ext ImmE BranchTakenE SrcAE SrcBE ALU Flags' CondExE Cond Unit ALUFlags ALUResultE WriteDataE WE A RD Data WD ALUOutM ReadDat aw ALUOutW WA3E WA3M WA3W CLR EN 23: Extend CLR PCPlus8D Result W MemtoRegE RegWriteW RegWriteM Match ForwardBE ForwardAE FlushE FlushD StallD StallF Hazard Unit 4

41 Control hazards with early BTA Time (cycles) 2 B B 3C DM AND R8, R, R3 ORR R9, R6, R AND R R3 ORR & R6 R DM DM Flush these instructions 2C SUB R, R, R7 3 SUB R, R, R ADD R2, R3, R4 R4 ADD R3 + DM R2 4

42 Control stalling logic PCWrPendingF = if write to PC in Decode, Execute or PCWrPendingF = PCSrcD + PCSrcE + PCSrcM Stall Fetch if PCWrPendingF StallF = ldrstalld + PCWrPendingF Flush Decode if PCWrPendingF OR PC is written in Writeback OR branch is taken FlushD = PCWrPendingF + PCSrcW + BranchTakenE Flush Execute if branch is taken FlushE = ldrstalld + BranchTakenE Stall Decode if ldrstalld (as before) StallD = ldrstalld 42

43 ARM Pipelined Processor with Hazard Unit PC' EN PCF 4 A RD Instructi on + InstrF PCPlus4F InstrD 27:26 25:2 5:2 3:28 9:6 3: 5:2 5 Control Unit Op Funct Rd RegSrcD RAD RA2D PCSrcD PCSrcE PCSrcM PCSrcW RegWriteD RegWriteE RegWriteM RegWriteW MemtoRegD MemtoRegE MemtoRegM MemtoRegW MemWriteD MemWriteE MemWriteM ALUControlD ALUControlE BranchD ALUSrcD FlagWriteD ImmSrcD A A2 A3 WD3 R5 WE3 RD Register File RD2 BranchE ALUSrcE FlagWriteE CondE FlagsE Ext ImmE BranchTakenE SrcAE SrcBE ALU Flags' CondExE Cond Unit ALUFlags ALUResultE WriteDataE WE A RD Data WD ALUOutM ReadDataW ALUOutW WA3E WA3M WA3W CLR EN 23: Extend CLR PCPlus8D Result W MemtoRegE RegWriteW RegWriteM Match ForwardBE ForwardAE FlushE FlushD StallD StallF Hazard Unit 43

44 Branch prediction Ideal pipelined processor: CPI = Branch misprediction increases CPI Static branch prediction: Always not taken Always taken Check direction of branch (forward or backward): If backward, predict taken; else, predict not taken Dynamic branch prediction: Make a dynamic based on history of branches In all cases, branch must be executed to see if prediction was correct if not, flush instructions and resume from correct direction! 44

45 Eliminating 2-cycle stall for taken-prediction policy with branch target buffer Even with predictor, still need to calculate the target address 2-cycle penalty for a taken branch Branch target buffer Cache of target addresses Indexed by PC when instruction fetched If hit and instruction is branch predicted taken, can fetch target immediately no 2-cycle penalty 45

46 Branch target buffer MUX PC Pipeline reg rest of pipeline Branch PC Branch target address No: it is not a branch Next PC = PC+4 = Yes: it is a branch and PC = branch target address 46

47 2. Dynamic branch prediction [floorplan of Pentium processor] In deeper and superscalar pipelines, branch penalty is more significant Use dynamic prediction: Branch prediction buffer (aka branch history table) indexed by recent branch instruction addresses and stores outcome (taken/not taken) To execute a branch: Check table, expect the same outcome Start fetching from fall-through or target If wrong, flush pipeline and flip prediction 47

48 -bit branch predictor MOV R, # MOV R, # ; R = sum ; R = i FOR CMP R, # BGE DONE ADD R, R, R ADD R, R, # B FOR DONE ; for (i=; i<; i=i+) ; sum = sum + i Remembers whether branch was taken the last time and does the same thing Mispredicts first and last branch of loop Prediction bits are added to BTB entries 48

49 Problem with -bit predictor Inner loop branches mispredicted twice! outer: inner: beq,, inner beq,, outer Mispredict as taken on last iteration of inner loop Then mispredict as not taken on first iteration of inner loop next time around 49

50 2-bit predictor Only change prediction on two successive mispredictions 5

51 Summary Pipelining for speedup Ideal speedup = number of stages, but actual speedup depends on delay balance between stages (clock frequency), delays introduced by pipeline registers, and number of stalls (CPI). Hazards (structural, data, and control) can increase CPI Hazards can be eliminated or mitigated using code reorganization, stalling, flushing, forwarding / bypassing Branch prediction, branch prediction buffer, branch target buffer can reduce stalls arising from control hazards 5

Pipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold

Pipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold Pipelining Readings: 4.5-4.8 Example: Doing the laundry Ann, Brian, Cathy, & Dave A B C D each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes Folder takes

More information

Lecture 14: Instruction Level Parallelism

Lecture 14: Instruction Level Parallelism Lecture 14: Instruction Level Parallelism Last time Pipelining in the real world Today Control hazards Other pipelines Take QUIZ 10 over P&H 4.10-15, before 11:59pm today Homework 5 due Thursday March

More information

Anne Bracy CS 3410 Computer Science Cornell University. [K. Bala, A. Bracy, S. McKee, E. Sirer, H. Weatherspoon]

Anne Bracy CS 3410 Computer Science Cornell University. [K. Bala, A. Bracy, S. McKee, E. Sirer, H. Weatherspoon] Anne Bracy CS 3410 Computer Science Cornell University [K. Bala, A. Bracy, S. McKee, E. Sirer, H. Weatherspoon] Prog. Mem PC +4 inst Reg. File 5 5 5 control ALU Data Mem Fetch Decode Execute Memory WB

More information

Advanced Superscalar Architectures. Speculative and Out-of-Order Execution

Advanced Superscalar Architectures. Speculative and Out-of-Order Execution 6.823, L16--1 Advanced Superscalar Architectures Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Speculative and Out-of-Order Execution Branch Prediction kill kill Branch

More information

Hakim Weatherspoon CS 3410 Computer Science Cornell University

Hakim Weatherspoon CS 3410 Computer Science Cornell University Hakim Weatherspoon CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer. memory inst register

More information

Improving Performance: Pipelining!

Improving Performance: Pipelining! Iproving Perforance: Pipelining! Meory General registers Meory ID EXE MEM WB Instruction Fetch (includes PC increent) ID Instruction Decode + fetching values fro general purpose registers EXE EXEcute arithetic/logic

More information

Out-of-order Pipeline. Register Read. OOO execution (2-wide) OOO execution (2-wide) OOO execution (2-wide) OOO execution (2-wide)

Out-of-order Pipeline. Register Read. OOO execution (2-wide) OOO execution (2-wide) OOO execution (2-wide) OOO execution (2-wide) Out-of-order Pipeline Register Read When do instructions read the register file? Fetch Decode Rename Dispatch Buffer of instructions Issue Reg-read Execute Writeback Commit Option #: after select, right

More information

Parallelism I: Inside the Core

Parallelism I: Inside the Core Parallelism I: Inside the Core 1 The final Comprehensive Same general format as the Midterm. Review the homeworks, the slides, and the quizzes. 2 Key Points What is wide issue mean? How does does it affect

More information

Computer Architecture 计算机体系结构. Lecture 3. Instruction-Level Parallelism I 第三讲 指令级并行 I. Chao Li, PhD. 李超博士

Computer Architecture 计算机体系结构. Lecture 3. Instruction-Level Parallelism I 第三讲 指令级并行 I. Chao Li, PhD. 李超博士 Computer Architecture 计算机体系结构 Lecture 3. Instruction-Level Parallelism I 第三讲 指令级并行 I Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2018 Review ISA, micro-architecture, physical design Evolution of ISA CISC vs

More information

PIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS

PIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS PIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 1 submission

More information

CS152: Computer Architecture and Engineering Introduction to Pipelining. October 22, 1997 Dave Patterson (http.cs.berkeley.

CS152: Computer Architecture and Engineering Introduction to Pipelining. October 22, 1997 Dave Patterson (http.cs.berkeley. CS152: Computer Architecture and Engineering Introduction to Pipelining October 22, 1997 Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/ cs 152

More information

CIS 371 Computer Organization and Design

CIS 371 Computer Organization and Design CIS 371 Computer Organization and Design Unit 10: Static & Dynamic Scheduling Slides developed by Milo Martin & Amir Roth at the University of Pennsylvania with sources that included University of Wisconsin

More information

Computer Architecture: Out-of-Order Execution. Prof. Onur Mutlu (editted by Seth) Carnegie Mellon University

Computer Architecture: Out-of-Order Execution. Prof. Onur Mutlu (editted by Seth) Carnegie Mellon University Computer Architecture: Out-of-Order Execution Prof. Onur Mutlu (editted by Seth) Carnegie Mellon University Reading for Today Smith and Sohi, The Microarchitecture of Superscalar Processors, Proceedings

More information

CIS 662: Sample midterm w solutions

CIS 662: Sample midterm w solutions CIS 662: Sample midterm w solutions 1. (40 points) A processor has the following stages in its pipeline: IF ID ALU1 MEM1 MEM2 ALU2 WB. ALU1 stage is used for effective address calculation for loads, stores

More information

CIS 371 Computer Organization and Design

CIS 371 Computer Organization and Design CIS 371 Computer Organization and Design Unit 10: Static & Dynamic Scheduling Slides developed by M. Martin, A.Roth, C.J. Taylor and Benedict Brown at the University of Pennsylvania with sources that included

More information

Code Scheduling & Limitations

Code Scheduling & Limitations This Unit: Static & Dynamic Scheduling CIS 371 Computer Organization and Design Unit 11: Static and Dynamic Scheduling App App App System software Mem CPU I/O Code scheduling To reduce pipeline stalls

More information

CMU Introduction to Computer Architecture, Spring 2013 HW 3 Solutions: Microprogramming Wrap-up and Pipelining

CMU Introduction to Computer Architecture, Spring 2013 HW 3 Solutions: Microprogramming Wrap-up and Pipelining CMU 18-447 Introduction to Computer Architecture, Spring 2013 HW 3 Solutions: Microprogramming Wrap-up and Pipelining Instructor: Prof. Onur Mutlu TAs: Justin Meza, Yoongu Kim, Jason Lin 1 Adding the REP

More information

Pipelined MIPS Datapath with Control Signals

Pipelined MIPS Datapath with Control Signals uction ess uction Rs [:26] (Opcode[5:]) [5:] ranch luor. Decoder Pipelined MIPS path with Signals luor Raddr at Five instruction sequence to be processed by pipeline: op [:26] rs [25:2] rt [2:6] rd [5:]

More information

Pipeline Hazards. See P&H Chapter 4.7. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Pipeline Hazards. See P&H Chapter 4.7. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Pipeline Hazards See P&H Chapter 4.7 Hakim Weatherspoon CS 341, Spring 213 Computer Science Cornell niversity Goals for Today Data Hazards Revisit Pipelined Processors Data dependencies Problem, detection,

More information

COSC 6385 Computer Architecture. - Tomasulos Algorithm

COSC 6385 Computer Architecture. - Tomasulos Algorithm COSC 6385 Computer Architecture - Tomasulos Algorithm Fall 2008 Analyzing a short code-sequence DIV.D F0, F2, F4 ADD.D F6, F0, F8 S.D F6, 0(R1) SUB.D F8, F10, F14 MUL.D F6, F10, F8 1 Analyzing a short

More information

Unit 9: Static & Dynamic Scheduling

Unit 9: Static & Dynamic Scheduling CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed by Drew Hilton, Amir Roth and Milo Mar;n at University of Pennsylvania CIS 501: Comp. Arch. Prof. Milo Martin

More information

6.823 Computer System Architecture Prerequisite Self-Assessment Test Assigned Feb. 6, 2019 Due Feb 11, 2019

6.823 Computer System Architecture Prerequisite Self-Assessment Test Assigned Feb. 6, 2019 Due Feb 11, 2019 6.823 Computer System Architecture Prerequisite Self-Assessment Test Assigned Feb. 6, 2019 Due Feb 11, 2019 http://csg.csail.mit.edu/6.823/ This self-assessment test is intended to help you determine your

More information

Pipeline Hazards. See P&H Chapter 4.7. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Pipeline Hazards. See P&H Chapter 4.7. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Pipeline Hazards See P&H Chapter 4.7 Hakim Weatherspoon CS 341, Spring 213 Computer Science Cornell niversity Goals for Today Data Hazards Revisit Pipelined Processors Data dependencies Problem, detection,

More information

Announcements. Programming assignment #2 due Monday 9/24. Talk: Architectural Acceleration of Real Time Physics Glenn Reinman, UCLA CS

Announcements. Programming assignment #2 due Monday 9/24. Talk: Architectural Acceleration of Real Time Physics Glenn Reinman, UCLA CS Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar GAS STATION Pipelining II Fall 2007 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin,

More information

To read more. CS 6354: Tomasulo. Intel Skylake. Scheduling. How can we reorder instructions? Without changing the answer.

To read more. CS 6354: Tomasulo. Intel Skylake. Scheduling. How can we reorder instructions? Without changing the answer. To read more CS 6354: Tomasulo 21 September 2016 This day s paper: Tomasulo, An Efficient Algorithm for Exploiting Multiple Arithmetic Units Supplementary readings: Hennessy and Patterson, Computer Architecture:

More information

CS 6354: Tomasulo. 21 September 2016

CS 6354: Tomasulo. 21 September 2016 1 CS 6354: Tomasulo 21 September 2016 To read more 1 This day s paper: Tomasulo, An Efficient Algorithm for Exploiting Multiple Arithmetic Units Supplementary readings: Hennessy and Patterson, Computer

More information

Advanced Superscalar Architectures

Advanced Superscalar Architectures Advanced Suerscalar Architectures Krste Asanovic Laboratory for Comuter Science Massachusetts Institute of Technology Physical Register Renaming (single hysical register file: MIPS R10K, Alha 21264, Pentium-4)

More information

Tomasulo-Style Register Renaming

Tomasulo-Style Register Renaming Tomasulo-Style Register Renaming ldf f0,x(r1) allocate RS#4 map f0 to RS#4 mulf f4,f0, allocate RS#6 ready, copy value f0 not ready, copy tag Map Table f0 f4 RS#4 RS T V1 V2 T1 T2 4 REG[r1] 6 REG[] RS#4

More information

DAT105: Computer Architecture Study Period 2, 2009 Exercise 2 Chapter 2: Instruction-Level Parallelism and Its Exploitation

DAT105: Computer Architecture Study Period 2, 2009 Exercise 2 Chapter 2: Instruction-Level Parallelism and Its Exploitation Study Period 2, 29 Exercise 2 Chapter 2: Instruction-Level Parallelism and Its Exploitation Mafijul Islam Department of Computer Science and Engineering November 12, 29 Study Period 2, 29 Goals: To understand

More information

Lecture 20: Parallelism ILP to Multicores. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 20: Parallelism ILP to Multicores. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 20: Parallelism ILP to Multicores James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L20 S1, James C. Hoe, CMU/ECE/CALCM, 2018 18 447 S18 L20 S2, James C. Hoe, CMU/ECE/CALCM,

More information

Computer Architecture ELE 475 / COS 475 Slide Deck 6: Superscalar 3. David Wentzlaff Department of Electrical Engineering Princeton University

Computer Architecture ELE 475 / COS 475 Slide Deck 6: Superscalar 3. David Wentzlaff Department of Electrical Engineering Princeton University Computer Architecture ELE 475 / COS 475 Slide Deck 6: Superscalar 3 David Wentzlaff Department of Electrical Engineering Princeton University 1 Agenda SpeculaJon and Branches Register Renaming Memory DisambiguaJon

More information

CS 152 Computer Architecture and Engineering. Lecture 15 - Advanced Superscalars

CS 152 Computer Architecture and Engineering. Lecture 15 - Advanced Superscalars CS 152 Comuter Architecture and Engineering Lecture 15 - Advanced Suerscalars Krste Asanovic Electrical Engineering and Comuter Sciences University of California at Berkeley htt://www.eecs.berkeley.edu/~krste

More information

Decoupling Loads for Nano-Instruction Set Computers

Decoupling Loads for Nano-Instruction Set Computers Decoupling Loads for Nano-Instruction Set Computers Ziqiang (Patrick) Huang, Andrew Hilton, Benjamin Lee Duke University {ziqiang.huang, andrew.hilton, benjamin.c.lee}@duke.edu ISCA-43, June 21, 2016 1

More information

Chapter 3: Computer Organization Fundamentals. Oregon State University School of Electrical Engineering and Computer Science.

Chapter 3: Computer Organization Fundamentals. Oregon State University School of Electrical Engineering and Computer Science. Chapter 3: Computer Organization Fundamentals Prof. Ben Lee Oregon State University School of Electrical Engineering and Computer Science Chapter Goals Understand the organization of a computer system

More information

CS 152 Computer Architecture and Engineering. Lecture 14 - Advanced Superscalars

CS 152 Computer Architecture and Engineering. Lecture 14 - Advanced Superscalars CS 152 Comuter Architecture and Engineering Lecture 14 - Advanced Suerscalars Krste Asanovic Electrical Engineering and Comuter Sciences University of California at Berkeley htt://www.eecs.berkeley.edu/~krste

More information

Optimality of Tomasulo s Algorithm Luna, Dong Gang, Zhao

Optimality of Tomasulo s Algorithm Luna, Dong Gang, Zhao Optimality of Tomasulo s Algorithm Luna, Dong Gang, Zhao Feb 28th, 2002 Our Questions about Tomasulo Questions about Tomasulo s Algorithm Is it optimal (can always produce the wisest instruction execution

More information

CprE 281: Digital Logic

CprE 281: Digital Logic CprE 28: Digital Logic Instructor: Alexander Stoytchev http://www.ece.iastate.edu/~alexs/classes/ Registers and Counters CprE 28: Digital Logic Iowa State University, Ames, IA Copyright Alexander Stoytchev

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 23 Synchronization 2006-11-16 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ 1 Last Time:

More information

CMPEN 411 VLSI Digital Circuits Spring Lecture 20: Multiplier Design

CMPEN 411 VLSI Digital Circuits Spring Lecture 20: Multiplier Design CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 20: Multiplier Design [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN 411

More information

CS 250! VLSI System Design

CS 250! VLSI System Design CS 250! VLSI System Design Lecture 3 Timing 2014-9-4! Professor Jonathan Bachrach! slides by John Lazzaro TA: Colin Schmidt www-insteecsberkeleyedu/~cs250/ UC Regents Fall 2013/1014 UCB everything doesn

More information

CSCI 510: Computer Architecture Written Assignment 2 Solutions

CSCI 510: Computer Architecture Written Assignment 2 Solutions CSCI 510: Computer Architecture Written Assignment 2 Solutions The following code does compution over two vectors. Consider different execution scenarios and provide the average number of cycles per iterion

More information

Registers Shift Registers Accumulators Register Files Register Transfer Language. Chapter 8 Registers. SKEE2263 Digital Systems

Registers Shift Registers Accumulators Register Files Register Transfer Language. Chapter 8 Registers. SKEE2263 Digital Systems Chapter 8 Registers SKEE2263 igital Systems Mun im Zabidi {munim@utm.my} Ismahani Ismail {ismahani@fke.utm.my} Izam Kamisian {e-izam@utm.my} Faculty of Electrical Engineering, Universiti Teknologi Malaysia

More information

Chapter 10 And, Finally... The Stack

Chapter 10 And, Finally... The Stack Chapter 10 And, Finally... The Stack Stacks: An Abstract Data Type A LIFO (last-in first-out) storage structure. The first thing you put in is the last thing you take out. The last thing you put in is

More information

SDRAM AS4SD8M Mb: 8 Meg x 16 SDRAM Synchronous DRAM Memory. PIN ASSIGNMENT (Top View)

SDRAM AS4SD8M Mb: 8 Meg x 16 SDRAM Synchronous DRAM Memory. PIN ASSIGNMENT (Top View) 128 Mb: 8 Meg x 16 SDRAM Synchronous DRAM Memory FEATURES Full Military temp (-55 C to 125 C) processing available Configuration: 8 Meg x 16 (2 Meg x 16 x 4 banks) Fully synchronous; all signals registered

More information

Direct-Mapped Cache Terminology. Caching Terminology. TIO Dan s great cache mnemonic. UCB CS61C : Machine Structures

Direct-Mapped Cache Terminology. Caching Terminology. TIO Dan s great cache mnemonic. UCB CS61C : Machine Structures Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 31 Caches II 2008-04-12 HP has begun testing research prototypes of a novel non-volatile memory element, the

More information

RAM-Type Interface for Embedded User Flash Memory

RAM-Type Interface for Embedded User Flash Memory June 2012 Introduction Reference Design RD1126 MachXO2-640/U and higher density devices provide a User Flash Memory (UFM) block, which can be used for a variety of applications including PROM data storage,

More information

128Mb Synchronous DRAM. Features High Performance: Description. REV 1.0 May, 2001 NT5SV32M4CT NT5SV16M8CT NT5SV8M16CT

128Mb Synchronous DRAM. Features High Performance: Description. REV 1.0 May, 2001 NT5SV32M4CT NT5SV16M8CT NT5SV8M16CT Features High Performance: f Clock Frequency -7K 3 CL=2-75B, CL=3-8B, CL=2 Single Pulsed RAS Interface Fully Synchronous to Positive Clock Edge Four Banks controlled by BS0/BS1 (Bank Select) Units 133

More information

A Predictive Delay Fault Avoidance Scheme for Coarse Grained Reconfigurable Architecture

A Predictive Delay Fault Avoidance Scheme for Coarse Grained Reconfigurable Architecture A Predictive Fault Avoidance Scheme for Coarse Grained Reconfigurable Architecture Toshihiro Kameda 1 Hiroaki Konoura 1 Dawood Alnajjar 1 Yukio Mitsuyama 2 Masanori Hashimoto 1 Takao Onoye 1 hasimoto@ist.osaka

More information

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 10 Instruction-Level Parallelism Part 3

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 10 Instruction-Level Parallelism Part 3 ECE 552 / CPS 550 Advanced Comuter Architecture I Lecture 10 Instruction-Level Parallelism Part 3 Benjamin Lee Electrical and Comuter Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html

More information

Warped-Compression: Enabling Power Efficient GPUs through Register Compression

Warped-Compression: Enabling Power Efficient GPUs through Register Compression WarpedCompression: Enabling Power Efficient GPUs through Register Compression Sangpil Lee, Keunsoo Kim, Won Woo Ro (Yonsei University*) Gunjae Koo, Hyeran Jeon, Murali Annavaram (USC) (*Work done while

More information

Scheduling. Purpose of scheduling. Scheduling. Scheduling. Concurrent & Distributed Systems Purpose of scheduling.

Scheduling. Purpose of scheduling. Scheduling. Scheduling. Concurrent & Distributed Systems Purpose of scheduling. 427 Concurrent & Distributed Systems 2017 6 Uwe R. Zimmer - The Australian National University 429 Motivation and definition of terms Purpose of scheduling 2017 Uwe R. Zimmer, The Australian National University

More information

128Mb DDR SDRAM. Features. Description. REV 1.1 Oct, 2006

128Mb DDR SDRAM. Features. Description. REV 1.1 Oct, 2006 Features Double data rate architecture: two data transfers per clock cycle Bidirectional data strobe () is transmitted and received with data, to be used in capturing data at the receiver is edge-aligned

More information

IS42S32200C1. 512K Bits x 32 Bits x 4 Banks (64-MBIT) SYNCHRONOUS DYNAMIC RAM

IS42S32200C1. 512K Bits x 32 Bits x 4 Banks (64-MBIT) SYNCHRONOUS DYNAMIC RAM 512K Bits x 32 Bits x 4 Banks (64-MBIT) SYNCHRONOUS DYNAMIC RAM JANUARY 2007 FEATURES Clock frequency: 183, 166, 143 MHz Fully synchronous; all signals referenced to a positive clock edge Internal bank

More information

M2 Instruction Set Architecture

M2 Instruction Set Architecture M2 Instruction Set Architecture Module Outline Addressing modes. Instruction classes. MIPS-I ISA. High level languages, Assembly languages and object code. Translating and starting a program. Subroutine

More information

HYB25D256400/800AT 256-MBit Double Data Rata SDRAM

HYB25D256400/800AT 256-MBit Double Data Rata SDRAM 256-MBit Double Data Rata SDRAM Features CAS Latency and Frequency Maximum Operating Frequency (MHz) CAS Latency DDR266A -7 DDR200-8 2 133 100 2.5 143 125 Double data rate architecture: two data transfers

More information

SYNCHRONOUS DRAM. 128Mb: x32 SDRAM. MT48LC4M32B2-1 Meg x 32 x 4 banks

SYNCHRONOUS DRAM. 128Mb: x32 SDRAM. MT48LC4M32B2-1 Meg x 32 x 4 banks SYNCHRONOUS DRAM 128Mb: x32 MT48LC4M32B2-1 Meg x 32 x 4 banks For the latest data sheet, please refer to the Micron Web site: www.micron.com/sdramds FEATURES PC100 functionality Fully synchronous; all

More information

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017 ECE 550D Fundamentals of Computer Systems and Engineering Fall 2017 Digital Arithmetic Prof. John Board Duke University Slides are derived from work by Profs. Tyler Bletch and Andrew Hilton (Duke) Last

More information

IS42S32200L IS45S32200L

IS42S32200L IS45S32200L IS42S32200L IS45S32200L 512K Bits x 32 Bits x 4 Banks (64-MBIT) SYNCHRONOUS DYNAMIC RAM OCTOBER 2012 FEATURES Clock frequency: 200, 166, 143, 133 MHz Fully synchronous; all signals referenced to a positive

More information

VHDL (and verilog) allow complex hardware to be described in either single-segment style to two-segment style

VHDL (and verilog) allow complex hardware to be described in either single-segment style to two-segment style FFs and Registers In this lecture, we show how the process block is used to create FFs and registers Flip-flops (FFs) and registers are both derived using our standard data types, std_logic, std_logic_vector,

More information

Techniques, October , Boston, USA. Personal use of this material is permitted. However, permission to

Techniques, October , Boston, USA. Personal use of this material is permitted. However, permission to Copyright 1996 IEEE. Published in the Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques, October 21-23 1996, Boston, USA. Personal use of this material is permitted.

More information

SDRAM DEVICE OPERATION

SDRAM DEVICE OPERATION POWER UP SEQUENCE SDRAM must be initialized with the proper power-up sequence to the following (JEDEC Standard 21C 3.11.5.4): 1. Apply power and start clock. Attempt to maintain a NOP condition at the

More information

HYB25D256[400/800/160]B[T/C](L) 256-Mbit Double Data Rate SDRAM, Die Rev. B Data Sheet Jan. 2003, V1.1. Features. Description

HYB25D256[400/800/160]B[T/C](L) 256-Mbit Double Data Rate SDRAM, Die Rev. B Data Sheet Jan. 2003, V1.1. Features. Description Data Sheet Jan. 2003, V1.1 Features CAS Latency and Frequency Maximum Operating Frequency (MHz) CAS Latency DDR200-8 DDR266A -7 DDR266-7F DDR333-6 2 100 133 133 133 2.5 125 143 143 166 Double data rate

More information

Near-Optimal Precharging in High-Performance Nanoscale CMOS Caches

Near-Optimal Precharging in High-Performance Nanoscale CMOS Caches Near-Optimal Precharging in High-Performance Nanoscale CMOS Caches Se-Hyun Yang and Babak Falsafi Computer Architecture Laboratory (CALCM) Carnegie Mellon University {sehyun, babak}@cmu.edu http://www.ece.cmu.edu/~powertap

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 02

More information

Fixing the Hyperdrive: Maximizing Rendering Performance on NVIDIA GPUs

Fixing the Hyperdrive: Maximizing Rendering Performance on NVIDIA GPUs Fixing the Hyperdrive: Maximizing Rendering Performance on NVIDIA GPUs Louis Bavoil, Principal Engineer Booth #223 - South Hall www.nvidia.com/gdc Full-Screen Pixel Shader SM TEX L2 DRAM CROP SM = Streaming

More information

CMPEN 411 VLSI Digital Circuits Spring Lecture 24: Peripheral Memory Circuits

CMPEN 411 VLSI Digital Circuits Spring Lecture 24: Peripheral Memory Circuits CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 24: Peripheral Memory Circuits [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp12

More information

ARC-H: Adaptive replacement cache management for heterogeneous storage devices

ARC-H: Adaptive replacement cache management for heterogeneous storage devices Journal of Systems Architecture 58 (2012) ARC-H: Adaptive replacement cache management for heterogeneous storage devices Young-Jin Kim, Division of Electrical and Computer Engineering, Ajou University,

More information

mith College Computer Science CSC231 Assembly Fall 2017 Week #4 Dominique Thiébaut

mith College Computer Science CSC231 Assembly Fall 2017 Week #4 Dominique Thiébaut mith College Computer Science CSC231 Assembly Fall 2017 Week #4 Dominique Thiébaut dthiebaut@smith.edu How are Integers Stored in Memory? 120 11F 11E 11D 11C 11B 11A 119 118 117 116 115 114 113 112 111

More information

Storage and Memory Hierarchy CS165

Storage and Memory Hierarchy CS165 Storage and Memory Hierarchy CS165 What is the memory hierarchy? L1

More information

CMPEN 411 VLSI Digital Circuits Spring Lecture 22: Memery, ROM

CMPEN 411 VLSI Digital Circuits Spring Lecture 22: Memery, ROM CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 22: Memery, ROM [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp12 CMPEN 411 L22 S.1

More information

EECS 583 Class 9 Classic Optimization

EECS 583 Class 9 Classic Optimization EECS 583 Class 9 Classic Optimization University of Michigan September 28, 2016 Generalizing Dataflow Analysis Transfer function» How information is changed by something (BB)» OUT = GEN + (IN KILL) /*

More information

Alloyed Branch History: Combining Global and Local Branch History for Robust Performance

Alloyed Branch History: Combining Global and Local Branch History for Robust Performance Alloyed Branch History: Combining Global and Local Branch History for Robust Performance UNIV. OF VIRGINIA DEPT. OF COMPUTER SCIENCE TECH. REPORT CS-22-21 Zhijian Lu, John Lach, Mircea R. Stan, Kevin Skadron

More information

$DA ECM DEFINITION FILE

$DA ECM DEFINITION FILE $DA ECM DEFINITION FILE OVERVIEW This document is intended to familiarize you with the features of C.A.T.S. Tuner Program. We do not attempt to provide instruction on engine tuning. The features provided

More information

1. Historical background of I2C I2C from a hardware perspective Bus Architecture The Basic I2C Protocol...

1. Historical background of I2C I2C from a hardware perspective Bus Architecture The Basic I2C Protocol... Table of contents CONTENTS 1. Historical background of I2C... 16 2. I2C from a hardware perspective... 18 3. Bus Architecture... 22 3.1. Basic Terminology... 23 4. The Basic I2C Protocol... 24 4.1. Flowchart...

More information

Practical Resource Management in Power-Constrained, High Performance Computing

Practical Resource Management in Power-Constrained, High Performance Computing Practical Resource Management in Power-Constrained, High Performance Computing Tapasya Patki*, David Lowenthal, Anjana Sasidharan, Matthias Maiterth, Barry Rountree, Martin Schulz, Bronis R. de Supinski

More information

UC Berkeley CS61C : Machine Structures

UC Berkeley CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 20 Synchronous Digital Systems Blu-ray vs HD-DVD war over? As you know, there are two different, competing formats for the next

More information

Discovery of Design Methodologies. Integration. Multi-disciplinary Design Problems

Discovery of Design Methodologies. Integration. Multi-disciplinary Design Problems Discovery of Design Methodologies for the Integration of Multi-disciplinary Design Problems Cirrus Shakeri Worcester Polytechnic Institute November 4, 1998 Worcester Polytechnic Institute Contents The

More information

- - DQ0 NC DQ1 DQ0 DQ2 - NC DQ1 DQ3 NC - NC

- - DQ0 NC DQ1 DQ0 DQ2 - NC DQ1 DQ3 NC - NC SYNCHRONOUS DRAM 64Mb: x4, x8, x16 MT48LC16M4A2 4 Meg x 4 x 4 banks MT48LC8M8A2 2 Meg x 8 x 4 banks MT48LC4M16A2 1 Meg x 16 x 4 banks For the latest data sheet, please refer to the Micron Web site: www.micron.com/mti/msp/html/datasheet.html

More information

Energy Source Lifetime Optimization for a Digital System through Power Management. Manish Kulkarni

Energy Source Lifetime Optimization for a Digital System through Power Management. Manish Kulkarni Energy Source Lifetime Optimization for a Digital System through Power Management by Manish Kulkarni A thesis submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements

More information

HYB25D256400B[T/C](L) HYB25D256800B[T/C](L) HYB25D256160B[T/C](L)

HYB25D256400B[T/C](L) HYB25D256800B[T/C](L) HYB25D256160B[T/C](L) Data Sheet, Rev. 1.21, Jul. 2004 HYB25D256400B[T/C](L) HYB25D256800B[T/C](L) HYB25D256160B[T/C](L) 256 Mbit Double Data Rate SDRAM DDR SDRAM Memory Products N e v e r s t o p t h i n k i n g. Edition 2004-07

More information

A48P4616B. 16M X 16 Bit DDR DRAM. Document Title 16M X 16 Bit DDR DRAM. Revision History. AMIC Technology, Corp. Rev. No. History Issue Date Remark

A48P4616B. 16M X 16 Bit DDR DRAM. Document Title 16M X 16 Bit DDR DRAM. Revision History. AMIC Technology, Corp. Rev. No. History Issue Date Remark 16M X 16 Bit DDR DRAM Document Title 16M X 16 Bit DDR DRAM Revision History Rev. No. History Issue Date Remark 1.0 Initial issue January 9, 2014 Final (January, 2014, Version 1.0) AMIC Technology, Corp.

More information

Drowsy Caches Simple Techniques for Reducing Leakage Power Krisztián Flautner Nam Sung Kim Steve Martin David Blaauw Trevor Mudge

Drowsy Caches Simple Techniques for Reducing Leakage Power Krisztián Flautner Nam Sung Kim Steve Martin David Blaauw Trevor Mudge Drowsy Caches Simple Techniques for Reducing Leakage Power Krisztián Flautner Nam Sung Kim Steve Martin David Blaauw Trevor Mudge krisztian.flautner@arm.com kimns@eecs.umich.edu stevenmm@eecs.umich.edu

More information

FabComp: Hardware specication

FabComp: Hardware specication Sol Boucher and Evan Klei CSCI-453-01 04/28/14 FabComp: Hardware specication 1 Hardware The computer is composed of a largely isolated data unit and control unit, which are only connected by a couple of

More information

Developing PMs for Hydraulic System

Developing PMs for Hydraulic System Developing PMs for Hydraulic System Focus on failure prevention rather than troubleshooting. Here are some best practices you can use to upgrade your preventive maintenance procedures for hydraulic systems.

More information

Lecture 31 Caches II TIO Dan s great cache mnemonic. Issues with Direct-Mapped

Lecture 31 Caches II TIO Dan s great cache mnemonic. Issues with Direct-Mapped CS61C L31 Caches II (1) inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 31 Caches II 26-11-13 Lecturer SOE Dan Garcia www.cs.berkeley.edu/~ddgarcia GPUs >> CPUs? Many are using

More information

INTERMEDIATE PROGRAMMING LESSON

INTERMEDIATE PROGRAMMING LESSON INTERMEDIATE PROGRAMMING LESSON DIFFERENT WAYS OF MOVING: SYNCHRONIZATION, REGULATED POWER, RAMP UP & DOWN By Sanjay and Arvind Seshan Objectives 1) Learn about different blocks for moving the robot and

More information

Improving Memory System Performance with Energy-Efficient Value Speculation

Improving Memory System Performance with Energy-Efficient Value Speculation Improving Memory System Performance with Energy-Efficient Value Speculation Nana B. Sam and Min Burtscher Computer Systems Laboratory Cornell University Ithaca, NY 14853 {besema, burtscher}@csl.cornell.edu

More information

- DQ0 - NC DQ1 - NC - NC DQ0 - NC DQ2 DQ1 DQ CONFIGURATION. None SPEED GRADE

- DQ0 - NC DQ1 - NC - NC DQ0 - NC DQ2 DQ1 DQ CONFIGURATION. None SPEED GRADE SYNCHRONOUS DRAM 52Mb: x4, x8, x6 MT48LC28M4A2 32 MEG x 4 x 4 S MT48LC64M8A2 6 MEG x 8 x 4 S MT48LC32M6A2 8 MEG x 6 x 4 S For the latest data sheet, please refer to the Micron Web site: www.micron.com/dramds

More information

JNC, JC, and JNZ Instructions for the WIMP51

JNC, JC, and JNZ Instructions for the WIMP51 JNC, JC, and JNZ Instructions for the WIMP51 EE 213 For the beginning of the project I looked up the Hex code for the JNC, JC, JNZ, as well as JZ so that I could compare with how it was created with the

More information

- DQ0 - NC DQ1 - NC - NC DQ0 - NC DQ2 DQ1 DQ

- DQ0 - NC DQ1 - NC - NC DQ0 - NC DQ2 DQ1 DQ SYNCHRONOUS DRAM ADVANCE MT48LC28M4A2 32 Meg x 4 x 4 banks MT48LC64M8A2 6 Meg x 8 x 4 banks MT48LC32M6A2 8 Meg x 6 x 4 banks For the latest data sheet, please refer to the Micron Web site: www.micron.com/dramds

More information

Introduction to Digital Techniques

Introduction to Digital Techniques to Digital Techniques Dan I. Porat, Ph.D. Stanford Linear Accelerator Center Stanford University, California Arpad Barna, Ph.D. Hewlett-Packard Laboratories Palo Alto, California John Wiley and Sons New

More information

Using Tridium s Sedona 1.2 Components with Workbench

Using Tridium s Sedona 1.2 Components with Workbench Using Tridium s Sedona 1.2 Components with Workbench This tutorial assists in the understanding of the Sedona components provided in Tridium s Sedona-1.2.28 release. New with the 1.2 release is that the

More information

In-Place Associative Computing:

In-Place Associative Computing: In-Place Associative Computing: A New Concept in Processor Design 1 Page Abstract 3 What s Wrong with Existing Processors? 3 Introducing the Associative Processing Unit 5 The APU Edge 5 Overview of APU

More information

(FPGA) based design for minimizing petrol spill from the pipe lines during sabotage

(FPGA) based design for minimizing petrol spill from the pipe lines during sabotage IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 05, Issue 01 (January. 2015), V3 PP 26-30 www.iosrjen.org (FPGA) based design for minimizing petrol spill from the pipe

More information

PPEP: ONLINE PERFORMANCE, POWER, AND ENERGY PREDICTION FRAMEWORK

PPEP: ONLINE PERFORMANCE, POWER, AND ENERGY PREDICTION FRAMEWORK PPEP: ONLINE PERFORMANCE, POWER, AND ENERGY PREDICTION FRAMEWORK BO SU JUNLI GU LI SHEN WEI HUANG JOSEPH L. GREATHOUSE ZHIYING WANG NUDT AMD RESEARCH DECEMBER 17, 2014 BACKGROUND Dynamic Voltage and Frequency

More information

Bimotion Advanced Port & Pipe Case study A step by step guide about how to calculate a 2-stroke engine.

Bimotion Advanced Port & Pipe Case study A step by step guide about how to calculate a 2-stroke engine. Bimotion Advanced Port & Pipe Case study A step by step guide about how to calculate a 2-stroke engine. 2009/aug/21. Bimotion. This paper is free for distribution and may be revised, for further references

More information

OTS Technical Advisory Committee Meeting

OTS Technical Advisory Committee Meeting OTS Technical Advisory Committee Meeting March 21st 2012 For Audio Dial 416-343-2285 or 1-877-969-8433 PIN# 4467765 Agenda 1) Diversion Rates 2) Tire Collection Update 3) Tire Transportation and Delivery

More information

DS1250W 3.3V 4096k Nonvolatile SRAM

DS1250W 3.3V 4096k Nonvolatile SRAM 19-5648; Rev 12/10 3.3V 4096k Nonvolatile SRAM www.maxim-ic.com FEATURES 10 years minimum data retention in the absence of external power Data is automatically protected during power loss Replaces 512k

More information

ABB June 19, Slide 1

ABB June 19, Slide 1 Dr Simon Round, Head of Technology Management, MATLAB Conference 2015, Bern Switzerland, 9 June 2015 A Decade of Efficiency Gains Leveraging modern development methods and the rising computational performance-price

More information

Flexible Ramping Product Technical Workshop

Flexible Ramping Product Technical Workshop Flexible Ramping Product Technical Workshop September 18, 2012 Lin Xu, Ph.D. Senior Market Development Engineer Don Tretheway Senior Market Design and Policy Specialist Agenda Time Topic Presenter 10:00

More information