COSC 6385 Computer Architecture. - Tomasulos Algorithm
|
|
- Dwight Montgomery
- 6 years ago
- Views:
Transcription
1 COSC 6385 Computer Architecture - Tomasulos Algorithm Fall 2008 Analyzing a short code-sequence DIV.D F0, F2, F4 ADD.D F6, F0, F8 S.D F6, 0(R1) SUB.D F8, F10, F14 MUL.D F6, F10, F8 1
2 Analyzing a short code-sequence 3 True data dependencies DIV.D F0, F2, F4 ADD.D F6, F0, F8 S.D F6, 0(R1) SUB.D F8, F10, F14 MUL.D F6, F10, F8 Analyzing a short code-sequence 3 True data dependencies DIV.D F0, F2, F4 ADD.D F6, F0, F8 S.D F6, 0(R1) SUB.D F8, F10, F14 MUL.D F6, F10, F8 2
3 Analyzing a short code-sequence 3 True data dependencies DIV.D F0, F2, F4 ADD.D F6, F0, F8 S.D F6, 0(R1) SUB.D F8, F10, F14 MUL.D F6, F10, F8 Analyzing a short code-sequence Anti-dependencies (WAR hazards) DIV.D F0, F2, F4 ADD.D F6, F0, F8 S.D F6, 0(R1) SUB.D F8, F10, F14 MUL.D F6, F10, F8 3
4 Analyzing a short code-sequence Output dependency (WAW DIV.D F0, F2, F4 hazard) ADD.D F6, F0, F8 S.D F6, 0(R1) SUB.D F8, F10, F14 MUL.D F6, F10, F8 Analyzing a short code-sequence DIV.D F0,F2, F4 ADD.D S, F0, F8 S.D S, 0(R1) SUB.D T, F10, F14 MUL.D F6,F10, T Renaming some registers can remove the WAR and WAW hazards Any subsequent use of F8 must be replaced by T 4
5 Tomasulo s Algorithm Register renaming is provided by reservation stations Buffer the operands of instructions waiting to being issued Fetches an operand as soon as available Eliminates the need to get an operand from register Pending instructions designate the reservation station providing the input For overlapping successive writes: only the last one will be executed Tomasulo s Algorithm Typically more reservation stations than registers Hazard detection is distributed (instead of centralized as in the Scoreboard) Results are passed directly from reservation stations to functional units using a common data bus (CDB) Each reservation station holds the opcode for the pending instruction and either operand values or names of reservation stations that will provide them Load and store buffers hold data and addresses for memory access 5
6 From instruction unit Instruction queue FP registers Common data bus Store buffers LOAD-STORE OPERATIONS Address unit Load buffers FP OPERATIONS Data Address Reservation stations Memory unit FP adders FP multipliers Tomasulo s Algorithm Load store buffers: Hold components of effective address Hold destination memory address ( = effective address) Hold value 6
7 Tomasulo s Algorithm Only three steps per instruction each step can take an arbitrary number of cycles Issue: get next instruction from FIFO instruction queue Search matching empty reservation station If found: issue instruction with operand values If not found: structural hazard-> instruction stalls If operands not in register: keep track of functional units producing operands Tomasulo s Algorithm Execute: If operands not available: monitor common data bus When all operands available: execute Write result: Write data on CDB and from there into registers 7
8 Data fields for reservation stations Q p : operation to perform on source operands S1 and S2 Q j, Q k : reservation stations producing the operands V j, V k : value for each operand A: holds information for memory address calculation (immediate field, effective address) Busy: indicates occupied functional units/reservation stations Q i : number of the reservation station who will produce the data to be stored in this register The same example as for scoreboarding L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 ADD.D F6, F8, F2 Following slides are based on a lecture by Jelena Mirkovic, University of Delaware Assumption: ADD and SUB take 2 clock cycles MULT takes 10 clock cycle DIV takes 40 clock cycles 2 Load/Store, 3 ADD and 2 Mult functional units/reservation stations 8
9 Time=1 Issue first load L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 ADD.D F6, F8, F2 Yes Load Regs[R2] 34 Add2 Mult1 Time=2 First load calc. address. Second load issued L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 ADD.D F6, F8, F2 Yes Load Regs[R2] +34 Yes Load Regs[R3] 45 Add2 Mult1 9
10 Time=3 First load read from mem. Second load calc address. Mult is issued L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 ADD.D F6, F8, F2 Yes Load Regs[R2]+34 Yes Load Regs[R3] +45 Add2 Mult1 Yes Mult Regs[F4] Mult1 Time=4 First load write res. Second load read mem. Mult stalled, Sub issued L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 ADD.D F6, F8, F2 Yes Load Regs[R3]+45 Yes Sub Mem[34+Regs[R2]] Add2 Mult1 Yes Mult Regs[F4] Mult1 10
11 Time=5 Second load write res. Mult stalled, Sub stalled, Div. issued L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 ADD.D F6, F8, F2 Yes Sub Mem[34+Regs[R2]] Mem[45+Regs[R3]] Add2 Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] Yes Div Mem[34+Regs[R2]] Mult1 Mult1 Time=6 Mult executes (1/10), Sub executes (1/2), Div. stalled, Add issued L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 ADD.D F6, F8, F2 Yes Sub Mem[34+Regs[R2]] Mem[45+Regs[R3]] Add2 Yes Add Mem[45+Regs[R3]] Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] Yes Div Mem[34+Regs[R2]] Mult1 Mult1 Add2 11
12 Time=7 Mult executes (2/10), Sub executes (2/2), Div. stalled, Add stalled L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 ADD.D F6, F8, F2 Yes Sub Mem[34+Regs[R2]] Mem[45+Regs[R3]] Add2 Yes Add Mem[45+Regs[R3]] Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] Yes Div Mem[34+Regs[R2]] Mult1 Mult1 Add2 Time=8 Mult executes (3/10), Sub writes res., Div. stalled, Add stalled L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 ADD.D F6, F8, F2 Add2 Yes Add Mem[34+Regs[R2]]- Mem[45+Regs[R3]] Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] Mem[45+Regs[R3]] Yes Div Mem[34+Regs[R2]] Mult1 Mult1 Add2 12
13 Time=9 Mult executes (4/10), Div. stalled, Add executes (1/2) L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 ADD.D F6, F8, F2 Add2 Yes Add Mem[34+Regs[R2]]- Mem[45+Regs[R3]] Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] Mem[45+Regs[R3]] Yes Div Mem[34+Regs[R2]] Mult1 Mult1 Add2 Time=10 Mult executes (5/10), Div. stalled, Add executes (2/2) L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 ADD.D F6, F8, F2 Add2 Yes Add Mem[34+Regs[R2]]- Mem[45+Regs[R3]] Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] Mem[45+Regs[R3]] Yes Div Mem[34+Regs[R2]] Mult1 Mult1 Add2 13
14 Time=11 Mult executes (6/10), Div. stalled, Add writes result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 ADD.D F6, F8, F2 Add2 Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] Yes Div Mem[34+Regs[R2]] Mult1 Mult1 Time=16 Mult writes result, Div. stalled L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 ADD.D F6, F8, F2 Add2 Mult1 Yes Div Mem[45+Regs[R3]] * Regs[F4] Mem[34+Regs[R2]] 14
15 Time=17 Div. Executed (1/40) L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 ADD.D F6, F8, F2 Add2 Mult1 Yes Div Mem[45+Regs[R3]] * Regs[F4] Mem[34+Regs[R2]] Time=57 Div. Writes result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 ADD.D F6, F8, F2 Add2 Mult1 15
16 Some remarks To preserve exception behavior, no instruction is allowed to initiate execution until all branches preceding the instruction have completed Load and store can be executed in different order if they access different addresses Not easy to verify, since 100(R3) can point to the same effective address as 0(R5)! -> A load must wait for any uncompleted stores to the same effective memory address -> A store must wait until there are no unexecuted loads/stores to the same memory address Some remarks (II) Effective memory address calculation has to be executed in order For a load operation: Calculate effective memory address Check for conflicts with all active (=pending) store buffers If conflict: load stalls Bypassing memory and taking data from the store buffer directly to the load buffer often done Else: execute load For a store operation: Similarly checking for conflicts with both active load and store buffers 16
17 A loop based example Loop: LD MULTD SD SUBI BNEZ F0, 0(R1) F4, F0, F2 F4, 0(R1) R1, R1,#8 R1, Loop This time assume Multiply takes 4 clocks Assume 1st load takes 8 clocks total (1 effective address + 7 mem. Access) (L1 cache miss), 2nd load takes 1 clock (hit) To be clear, will show clocks for SUBI, BNEZ Reality: integer instructions ahead of Fl. Pt. Instructions Show 2 iterations Slide based on a lecture by David A. Patterson, University of California, Berkley Time=1 Issue first load L.D F0, 0(R1) 1 MUL.D F4, F0, F2 S.D F4, 0(R1) L.D F0, 0(R1) MUL.D F4, F0, F2 S.D F4, 0(R1) Yes Load Regs[R1] 0 Store1 Store2 Mult1 17
18 Time=2 first load effective address calc., Issue mult L.D F0, 0(R1) 1 MUL.D F4, F0, F2 2 S.D F4, 0(R1) L.D F0, 0(R1) MUL.D F4, F0, F2 S.D F4, 0(R1) Yes Load Regs[R1] +0 Store1 Store2 Mult1 Yes Mult Regs[F2] Mult1 Time=3 first load mem. access(1/7), mult stalled, Issue store L.D F0, 0(R1) 1 MUL.D F4, F0, F2 2 S.D F4, 0(R1) 3 L.D F0, 0(R1) MUL.D F4, F0, F2 S.D F4, 0(R1) Yes Load Regs[R1]+0 Store1 Yes Store Regs[R1] Mult1 0 Store2 Mult1 Yes Mult Regs[F2] Mult1 18
19 Time=4 first load ex (2/7)., mult stall, store eff. addr, Calc SUBI (not shown) L.D F0, 0(R1) 1 MUL.D F4, F0, F2 2 S.D F4, 0(R1) 3 L.D F0, 0(R1) MUL.D F4, F0, F2 S.D F4, 0(R1) Yes Load Regs[R1]+0 Store1 Yes Store Regs[R1] Mult1 +0 Store2 Mult1 Yes Mult Regs[F2] Mult1 Time=5 first load exec (3/7)., mult stall, store stall, BNEZ (not shown) L.D F0, 0(R1) 1 MUL.D F4, F0, F2 2 S.D F4, 0(R1) 3 L.D F0, 0(R1) MUL.D F4, F0, F2 S.D F4, 0(R1) Yes Load Regs[R1]+0 Store1 Yes Store Mult1 Regs[R1] +0 Store2 Mult1 Yes Mult Regs[F2] Mult1 19
20 Time=6 first load exec (4/7)., mult stall, store stall, issue load L.D F0, 0(R1) 1 MUL.D F4, F0, F2 2 S.D F4, 0(R1) 3 L.D F0, 0(R1) 6 MUL.D F4, F0, F2 S.D F4, 0(R1) Yes Load Regs[R1]+0 Yes Load Regs[R1] 0 Store1 Yes Store Mult1 Regs[R1]+0 Store2 Mult1 Yes Mult Regs[F2] Mult1 Time=7 first load ex (5/7)., mult stall, store stall, load2 eff. Add., issue mult2 L.D F0, 0(R1) 1 MUL.D F4, F0, F2 2 S.D F4, 0(R1) 3 L.D F0, 0(R1) 6 MUL.D F4, F0, F2 7 S.D F4, 0(R1) Yes Load Regs[R1]+0 Yes Load Regs[R1] +0 Store1 Yes Store Mult1 Regs[R1]+0 Store2 Mult1 Yes Mult Regs[F2] Yes Mult Regs[F2] 20
21 Time=8 first load ex (6/7)., mult, store, mult2 stall, load2 ex., issue store2 L.D F0, 0(R1) 1 MUL.D F4, F0, F2 2 S.D F4, 0(R1) 3 L.D F0, 0(R1) 6 MUL.D F4, F0, F2 7 S.D F4, 0(R1) 8 Yes Load Regs[R1]+0 Yes Load Regs[R1]+0 Store1 Yes Store Mult1 Regs[R1]+0 Store2 Yes Store Regs[R1] 0 Mult1 Yes Mult Regs[F2] Yes Mult Regs[F2] Time=9 first load exec (7/7)., mult, store, mult2 stall, load2 exec., store2 L.D F0, 0(R1) 1 9 MUL.D F4, F0, F2 2 S.D F4, 0(R1) 3 L.D F0, 0(R1) 6 MUL.D F4, F0, F2 7 S.D F4, 0(R1) 8 Yes Load Regs[R1]+0 Yes Load Regs[R1]+0 Store1 Yes Store Mult1 Regs[R1]+0 Store2 Yes Store Regs[R1] +0 Mult1 Yes Mult Regs[F2] Yes Mult Regs[F2] 21
22 Time=10 first load write res. mult, store, mult2 stall, load2 finish, store2 stal L.D F0, 0(R1) MUL.D F4, F0, F2 2 S.D F4, 0(R1) 3 L.D F0, 0(R1) 6 10 MUL.D F4, F0, F2 7 S.D F4, 0(R1) 8 Yes Load Regs[R1]+0 Store1 Yes Store Mult1 Regs[R1]+0 Store2 Yes Store Regs[R1]+0 Mult1 Yes Mult Mem[] Regs[F2] Yes Mult Regs[F2] Time=11 L.D F0, 0(R1) MUL.D F4, F0, F2 2 S.D F4, 0(R1) 3 L.D F0, 0(R1) MUL.D F4, F0, F2 7 S.D F4, 0(R1) 8 Store1 Yes Store Mult1 Regs[R1]+0 Store2 Yes Store Regs[R1]+0 Mult1 Yes Mult Mem[] Regs[F2] Yes Mult Mem[] Regs[F2] Load 2 write res, Mult1 (1/4), mult2, store1, store2 stalled 22
23 Time=14 L.D F0, 0(R1) MUL.D F4, F0, F S.D F4, 0(R1) 3 L.D F0, 0(R1) MUL.D F4, F0, F2 7 S.D F4, 0(R1) 8 Store1 Yes Store Mult1 Regs[R1]+0 Store2 Yes Store Regs[R1]+0 Mult1 Yes Mult Mem[] Regs[F2] Yes Mult Mem[] Regs[F2] Mult1 (4/4), (3/4), store1, store2 stalled Time=15 L.D F0, 0(R1) MUL.D F4, F0, F S.D F4, 0(R1) 3 L.D F0, 0(R1) MUL.D F4, F0, F S.D F4, 0(R1) 8 Store1 Yes Store Mult1 Regs[R1]+0 Store2 Yes Store Regs[R1]+0 Mult1 Yes Mult Mem[] Regs[F2] Mult1 write res., (4/4), store1 exec, store2 stalled 23
24 Time=16 store1, store2 exec L.D F0, 0(R1) MUL.D F4, F0, F S.D F4, 0(R1) 3 L.D F0, 0(R1) MUL.D F4, F0, F S.D F4, 0(R1) 8 Store1 Yes Store Mult1 Regs[R1]+0 Store2 Yes Store Regs[R1]+0 Mult1 Tomasulo s Algorithm Please note: F0 never sees data from the first load Register File completely detached from computation First and Second iteration overlap completely Assuming two Mult units, we could not have issued a third mult operation for the next iteration of the loop -> no third store instruction could be issued In order issue, out-of-order execution, out-of-order completion Slide based on a lecture by David A. Patterson, University of California, Berkley 24
25 Why can Tomasulo overlap iterations of loops? Register renaming Multiple iterations use different physical destinations for registers (dynamic loop unrolling). s Permit instruction issue to advance past integer control flow operations Also buffer old values of registers - totally avoiding the WAR stall that we saw in the scoreboard. Other perspective: Tomasulo building data flow dependency graph on the fly. Slide based on a lecture by David A. Patterson, University of California, Berkley 25
Computer Architecture 计算机体系结构. Lecture 3. Instruction-Level Parallelism I 第三讲 指令级并行 I. Chao Li, PhD. 李超博士
Computer Architecture 计算机体系结构 Lecture 3. Instruction-Level Parallelism I 第三讲 指令级并行 I Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2018 Review ISA, micro-architecture, physical design Evolution of ISA CISC vs
More informationLecture 14: Instruction Level Parallelism
Lecture 14: Instruction Level Parallelism Last time Pipelining in the real world Today Control hazards Other pipelines Take QUIZ 10 over P&H 4.10-15, before 11:59pm today Homework 5 due Thursday March
More informationCSCI 510: Computer Architecture Written Assignment 2 Solutions
CSCI 510: Computer Architecture Written Assignment 2 Solutions The following code does compution over two vectors. Consider different execution scenarios and provide the average number of cycles per iterion
More informationDAT105: Computer Architecture Study Period 2, 2009 Exercise 2 Chapter 2: Instruction-Level Parallelism and Its Exploitation
Study Period 2, 29 Exercise 2 Chapter 2: Instruction-Level Parallelism and Its Exploitation Mafijul Islam Department of Computer Science and Engineering November 12, 29 Study Period 2, 29 Goals: To understand
More informationOut-of-order Pipeline. Register Read. OOO execution (2-wide) OOO execution (2-wide) OOO execution (2-wide) OOO execution (2-wide)
Out-of-order Pipeline Register Read When do instructions read the register file? Fetch Decode Rename Dispatch Buffer of instructions Issue Reg-read Execute Writeback Commit Option #: after select, right
More informationOptimality of Tomasulo s Algorithm Luna, Dong Gang, Zhao
Optimality of Tomasulo s Algorithm Luna, Dong Gang, Zhao Feb 28th, 2002 Our Questions about Tomasulo Questions about Tomasulo s Algorithm Is it optimal (can always produce the wisest instruction execution
More informationCS 6354: Tomasulo. 21 September 2016
1 CS 6354: Tomasulo 21 September 2016 To read more 1 This day s paper: Tomasulo, An Efficient Algorithm for Exploiting Multiple Arithmetic Units Supplementary readings: Hennessy and Patterson, Computer
More informationTo read more. CS 6354: Tomasulo. Intel Skylake. Scheduling. How can we reorder instructions? Without changing the answer.
To read more CS 6354: Tomasulo 21 September 2016 This day s paper: Tomasulo, An Efficient Algorithm for Exploiting Multiple Arithmetic Units Supplementary readings: Hennessy and Patterson, Computer Architecture:
More informationComputer Architecture: Out-of-Order Execution. Prof. Onur Mutlu (editted by Seth) Carnegie Mellon University
Computer Architecture: Out-of-Order Execution Prof. Onur Mutlu (editted by Seth) Carnegie Mellon University Reading for Today Smith and Sohi, The Microarchitecture of Superscalar Processors, Proceedings
More informationTomasulo-Style Register Renaming
Tomasulo-Style Register Renaming ldf f0,x(r1) allocate RS#4 map f0 to RS#4 mulf f4,f0, allocate RS#6 ready, copy value f0 not ready, copy tag Map Table f0 f4 RS#4 RS T V1 V2 T1 T2 4 REG[r1] 6 REG[] RS#4
More informationParallelism I: Inside the Core
Parallelism I: Inside the Core 1 The final Comprehensive Same general format as the Midterm. Review the homeworks, the slides, and the quizzes. 2 Key Points What is wide issue mean? How does does it affect
More informationAdvanced Superscalar Architectures. Speculative and Out-of-Order Execution
6.823, L16--1 Advanced Superscalar Architectures Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Speculative and Out-of-Order Execution Branch Prediction kill kill Branch
More informationCIS 371 Computer Organization and Design
CIS 371 Computer Organization and Design Unit 10: Static & Dynamic Scheduling Slides developed by M. Martin, A.Roth, C.J. Taylor and Benedict Brown at the University of Pennsylvania with sources that included
More informationCIS 371 Computer Organization and Design
CIS 371 Computer Organization and Design Unit 10: Static & Dynamic Scheduling Slides developed by Milo Martin & Amir Roth at the University of Pennsylvania with sources that included University of Wisconsin
More informationUnit 9: Static & Dynamic Scheduling
CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed by Drew Hilton, Amir Roth and Milo Mar;n at University of Pennsylvania CIS 501: Comp. Arch. Prof. Milo Martin
More informationComputer Architecture ELE 475 / COS 475 Slide Deck 6: Superscalar 3. David Wentzlaff Department of Electrical Engineering Princeton University
Computer Architecture ELE 475 / COS 475 Slide Deck 6: Superscalar 3 David Wentzlaff Department of Electrical Engineering Princeton University 1 Agenda SpeculaJon and Branches Register Renaming Memory DisambiguaJon
More informationCS 152 Computer Architecture and Engineering. Lecture 15 - Advanced Superscalars
CS 152 Comuter Architecture and Engineering Lecture 15 - Advanced Suerscalars Krste Asanovic Electrical Engineering and Comuter Sciences University of California at Berkeley htt://www.eecs.berkeley.edu/~krste
More informationPipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold
Pipelining Readings: 4.5-4.8 Example: Doing the laundry Ann, Brian, Cathy, & Dave A B C D each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes Folder takes
More informationCS152: Computer Architecture and Engineering Introduction to Pipelining. October 22, 1997 Dave Patterson (http.cs.berkeley.
CS152: Computer Architecture and Engineering Introduction to Pipelining October 22, 1997 Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/ cs 152
More informationImproving Performance: Pipelining!
Iproving Perforance: Pipelining! Meory General registers Meory ID EXE MEM WB Instruction Fetch (includes PC increent) ID Instruction Decode + fetching values fro general purpose registers EXE EXEcute arithetic/logic
More informationAnnouncements. Programming assignment #2 due Monday 9/24. Talk: Architectural Acceleration of Real Time Physics Glenn Reinman, UCLA CS
Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar GAS STATION Pipelining II Fall 2007 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin,
More informationAdvanced Superscalar Architectures
Advanced Suerscalar Architectures Krste Asanovic Laboratory for Comuter Science Massachusetts Institute of Technology Physical Register Renaming (single hysical register file: MIPS R10K, Alha 21264, Pentium-4)
More informationPipelined MIPS Datapath with Control Signals
uction ess uction Rs [:26] (Opcode[5:]) [5:] ranch luor. Decoder Pipelined MIPS path with Signals luor Raddr at Five instruction sequence to be processed by pipeline: op [:26] rs [25:2] rt [2:6] rd [5:]
More informationM2 Instruction Set Architecture
M2 Instruction Set Architecture Module Outline Addressing modes. Instruction classes. MIPS-I ISA. High level languages, Assembly languages and object code. Translating and starting a program. Subroutine
More informationCS 152 Computer Architecture and Engineering. Lecture 14 - Advanced Superscalars
CS 152 Comuter Architecture and Engineering Lecture 14 - Advanced Suerscalars Krste Asanovic Electrical Engineering and Comuter Sciences University of California at Berkeley htt://www.eecs.berkeley.edu/~krste
More informationChapter 3: Computer Organization Fundamentals. Oregon State University School of Electrical Engineering and Computer Science.
Chapter 3: Computer Organization Fundamentals Prof. Ben Lee Oregon State University School of Electrical Engineering and Computer Science Chapter Goals Understand the organization of a computer system
More informationCode Scheduling & Limitations
This Unit: Static & Dynamic Scheduling CIS 371 Computer Organization and Design Unit 11: Static and Dynamic Scheduling App App App System software Mem CPU I/O Code scheduling To reduce pipeline stalls
More informationCIS 662: Sample midterm w solutions
CIS 662: Sample midterm w solutions 1. (40 points) A processor has the following stages in its pipeline: IF ID ALU1 MEM1 MEM2 ALU2 WB. ALU1 stage is used for effective address calculation for loads, stores
More informationChapter 2 ( ) -Revisit ReOrder Buffer -Exception handling and. (parallelism in HW)
Comuter Architecture A Quantitative Aroach, Fifth Edition Chater 2 (2.6-2.11) -Revisit ReOrder Buffer -Excetion handling and (seculation in hardware) -VLIW and EPIC (seculation in SW, arallelism in SW)
More informationLecture 20: Parallelism ILP to Multicores. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 20: Parallelism ILP to Multicores James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L20 S1, James C. Hoe, CMU/ECE/CALCM, 2018 18 447 S18 L20 S2, James C. Hoe, CMU/ECE/CALCM,
More informationAnne Bracy CS 3410 Computer Science Cornell University. [K. Bala, A. Bracy, S. McKee, E. Sirer, H. Weatherspoon]
Anne Bracy CS 3410 Computer Science Cornell University [K. Bala, A. Bracy, S. McKee, E. Sirer, H. Weatherspoon] Prog. Mem PC +4 inst Reg. File 5 5 5 control ALU Data Mem Fetch Decode Execute Memory WB
More informationECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 10 Instruction-Level Parallelism Part 3
ECE 552 / CPS 550 Advanced Comuter Architecture I Lecture 10 Instruction-Level Parallelism Part 3 Benjamin Lee Electrical and Comuter Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html
More information6.823 Computer System Architecture Prerequisite Self-Assessment Test Assigned Feb. 6, 2019 Due Feb 11, 2019
6.823 Computer System Architecture Prerequisite Self-Assessment Test Assigned Feb. 6, 2019 Due Feb 11, 2019 http://csg.csail.mit.edu/6.823/ This self-assessment test is intended to help you determine your
More informationENGN1640: Design of Computing Systems Topic 05: Pipeline Processor Design
ENGN64: Design of Computing Systems Topic 5: Pipeline Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University
More informationHakim Weatherspoon CS 3410 Computer Science Cornell University
Hakim Weatherspoon CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer. memory inst register
More informationDirect-Mapped Cache Terminology. Caching Terminology. TIO Dan s great cache mnemonic. UCB CS61C : Machine Structures
Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 31 Caches II 2008-04-12 HP has begun testing research prototypes of a novel non-volatile memory element, the
More informationCMU Introduction to Computer Architecture, Spring 2013 HW 3 Solutions: Microprogramming Wrap-up and Pipelining
CMU 18-447 Introduction to Computer Architecture, Spring 2013 HW 3 Solutions: Microprogramming Wrap-up and Pipelining Instructor: Prof. Onur Mutlu TAs: Justin Meza, Yoongu Kim, Jason Lin 1 Adding the REP
More informationEECS 583 Class 9 Classic Optimization
EECS 583 Class 9 Classic Optimization University of Michigan September 28, 2016 Generalizing Dataflow Analysis Transfer function» How information is changed by something (BB)» OUT = GEN + (IN KILL) /*
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 23 Synchronization 2006-11-16 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ 1 Last Time:
More informationDecoupling Loads for Nano-Instruction Set Computers
Decoupling Loads for Nano-Instruction Set Computers Ziqiang (Patrick) Huang, Andrew Hilton, Benjamin Lee Duke University {ziqiang.huang, andrew.hilton, benjamin.c.lee}@duke.edu ISCA-43, June 21, 2016 1
More informationRAM-Type Interface for Embedded User Flash Memory
June 2012 Introduction Reference Design RD1126 MachXO2-640/U and higher density devices provide a User Flash Memory (UFM) block, which can be used for a variety of applications including PROM data storage,
More informationFixing the Hyperdrive: Maximizing Rendering Performance on NVIDIA GPUs
Fixing the Hyperdrive: Maximizing Rendering Performance on NVIDIA GPUs Louis Bavoil, Principal Engineer Booth #223 - South Hall www.nvidia.com/gdc Full-Screen Pixel Shader SM TEX L2 DRAM CROP SM = Streaming
More informationProposed Solution to Mitigate Concerns Regarding AC Power Flow under Convergence Bidding. September 25, 2009
Proposed Solution to Mitigate Concerns Regarding AC Power Flow under Convergence Bidding September 25, 2009 Proposed Solution to Mitigate Concerns Regarding AC Power Flow under Convergence Bidding Background
More informationPIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS
PIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 1 submission
More informationScheduling. Purpose of scheduling. Scheduling. Scheduling. Concurrent & Distributed Systems Purpose of scheduling.
427 Concurrent & Distributed Systems 2017 6 Uwe R. Zimmer - The Australian National University 429 Motivation and definition of terms Purpose of scheduling 2017 Uwe R. Zimmer, The Australian National University
More informationLADOT Railroad Preemption Form Instructions
LADOT Railroad Preemption Form Instructions The LADOT Railroad Preemption Form is entirely contained on one worksheet within an Excel workbook. If Additional approaches to the crossing are analyzed, the
More informationAPPLICATION NOTE Application Note for Torque Down Capper Application
Application Note for Torque Down Capper Application 1 Application Note for Torque Down Capper using ASDA-A2 servo Contents Application Note for Capper Axis with Reject Queue using ASDA-A2 servo... 2 1
More informationProgramming Languages (CS 550)
Programming Languages (CS 550) Mini Language Compiler Jeremy R. Johnson 1 Introduction Objective: To illustrate how to map Mini Language instructions to RAL instructions. To do this in a systematic way
More informationPipeline Hazards. See P&H Chapter 4.7. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University
Pipeline Hazards See P&H Chapter 4.7 Hakim Weatherspoon CS 341, Spring 213 Computer Science Cornell niversity Goals for Today Data Hazards Revisit Pipelined Processors Data dependencies Problem, detection,
More informationFunctional Algorithm for Automated Pedestrian Collision Avoidance System
Functional Algorithm for Automated Pedestrian Collision Avoidance System Customer: Mr. David Agnew, Director Advanced Engineering of Mobis NA Sep 2016 Overview of Need: Autonomous or Highly Automated driving
More informationIsaac Newton vs. Red Light Cameras
2012 Isaac Newton vs. Red Light Cameras Approach Speed vs. Speed Limit Brian Cecvehicleelli redlightrobber.com 3/1/2012 Table of Contents Approach Speed vs. Speed Limit... 3 Definition of Speed Limit...
More informationStorage and Memory Hierarchy CS165
Storage and Memory Hierarchy CS165 What is the memory hierarchy? L1
More informationGood Winding Starts the First 5 Seconds Part 2 Drives Clarence Klassen, P.Eng.
Good Winding Starts the First 5 Seconds Part 2 Drives Clarence Klassen, P.Eng. Abstract: This is the second part of the "Good Winding Starts" presentation. Here we discuss the drive system and its requirements
More informationARC-H: Adaptive replacement cache management for heterogeneous storage devices
Journal of Systems Architecture 58 (2012) ARC-H: Adaptive replacement cache management for heterogeneous storage devices Young-Jin Kim, Division of Electrical and Computer Engineering, Ajou University,
More informationWarped-Compression: Enabling Power Efficient GPUs through Register Compression
WarpedCompression: Enabling Power Efficient GPUs through Register Compression Sangpil Lee, Keunsoo Kim, Won Woo Ro (Yonsei University*) Gunjae Koo, Hyeran Jeon, Murali Annavaram (USC) (*Work done while
More informationPipeline Hazards. See P&H Chapter 4.7. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University
Pipeline Hazards See P&H Chapter 4.7 Hakim Weatherspoon CS 341, Spring 213 Computer Science Cornell niversity Goals for Today Data Hazards Revisit Pipelined Processors Data dependencies Problem, detection,
More informationThe purpose of this lab is to explore the timing and termination of a phase for the cross street approach of an isolated intersection.
1 The purpose of this lab is to explore the timing and termination of a phase for the cross street approach of an isolated intersection. Two learning objectives for this lab. We will proceed over the remainder
More informationUnmanned autonomous vehicles in air land and sea
based on Ulrich Schwesinger lecture on MOTION PLANNING FOR AUTOMATED CARS Unmanned autonomous vehicles in air land and sea Some relevant examples from the DARPA Urban Challenge Matteo Matteucci matteo.matteucci@polimi.it
More informationRule-based Integration of Multiple Neural Networks Evolved Based on Cellular Automata
1 Robotics Rule-based Integration of Multiple Neural Networks Evolved Based on Cellular Automata 2 Motivation Construction of mobile robot controller Evolving neural networks using genetic algorithm (Floreano,
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 02
More informationProject 2: Traffic and Queuing (updated 28 Feb 2006)
Project 2: Traffic and Queuing (updated 28 Feb 2006) The Evergreen Point Bridge (Figure 1) on SR-520 is ranked the 9 th worst commuter hot spot in the U.S. (AAA, 2005). This floating bridge supports the
More informationFabComp: Hardware specication
Sol Boucher and Evan Klei CSCI-453-01 04/28/14 FabComp: Hardware specication 1 Hardware The computer is composed of a largely isolated data unit and control unit, which are only connected by a couple of
More informationAlternative Fuel Engine Control Unit
1999 Chevrolet/Geo Cavalier (CNG) Alternative Fuel Engine Control Unit Table 1: AF ECU Function Parameters The (AF ECU) controls alternative fuel engine operation. The control unit monitors various engine
More informationAnalyzing Feature Interactions in Automobiles. John Thomas, Ph.D. Seth Placke
Analyzing Feature Interactions in Automobiles John Thomas, Ph.D. Seth Placke 3.25.14 Outline Project Introduction & Background STPA Case Study New Strategy for Analyzing Interactions Contributions Project
More informationCruise Control 1993 Jeep Cherokee
Cruise Control 1993 Jeep Cherokee Design Examples 1 Owner s Manual System Description: Cruise Control System Interface When engaged, the electronic cruise control device takes over the accelerator operations
More informationContents Please read this manual! Keep this manual!
-1- Thank you for purchasing. Your lead acid battery capacity indicator Qulon-12ns is designed for fast battery capacity estimation. With the help of the device, you can test new or used 12-volt lead acid
More informationindex Page numbers shown in italic indicate figures. Numbers & Symbols
index Page numbers shown in italic indicate figures. Numbers & Symbols 12T gear, 265 24T gear, 265 36T gear, 265 / (division operator), 332 % (modulo operator), 332 * (multiplication operator), 332 A accelerating
More information(Refer Slide Time: 00:01:10min)
Introduction to Transportation Engineering Dr. Bhargab Maitra Department of Civil Engineering Indian Institute of Technology, Kharagpur Lecture - 11 Overtaking, Intermediate and Headlight Sight Distances
More informationIntroduction to PowerWorld Simulator: Interface and Common Tools
Introduction to PowerWorld Simulator: Interface and Common Tools I10: Introduction to Contingency Analysis 2001 South First Street Champaign, Illinois 61820 +1 (217) 384.6330 support@powerworld.com http://www.powerworld.com
More informationChapter 10 And, Finally... The Stack
Chapter 10 And, Finally... The Stack Stacks: An Abstract Data Type A LIFO (last-in first-out) storage structure. The first thing you put in is the last thing you take out. The last thing you put in is
More informationVideo Communications Presents. Reference Guide and Test Questions. Tail Swing Safety for School Bus Drivers
Video Communications Presents Reference Guide and Test Questions Tail Swing Safety for School Bus Drivers Introduction Tail swing occurs whenever a bus makes a turn. The school bus driver must be aware
More informationAutomated Driving - Object Perception at 120 KPH Chris Mansley
IROS 2014: Robots in Clutter Workshop Automated Driving - Object Perception at 120 KPH Chris Mansley 1 Road safety influence of driver assistance 100% Installation rates / road fatalities in Germany 80%
More informationCMPEN 411 VLSI Digital Circuits Spring Lecture 22: Memery, ROM
CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 22: Memery, ROM [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp12 CMPEN 411 L22 S.1
More informationWritten Exam Public Transport + Answers
Faculty of Engineering Technology Written Exam Public Transport + Written Exam Public Transport (195421200-1A) Teacher van Zuilekom Course code 195421200 Date and time 7-11-2011, 8:45-12:15 Location OH116
More informationThe TIMMO Methodology
ITEA 2 06005: TIMMO Timing Model The TIMMO Methodology Guest Lecture at Chalmers University February 9 th, 2010 Stefan Kuntz, Continental Automotive GmbH 2010-02-09 Chalmers University, Göteborg Slide
More informationThe Rollover Request customer accepts a reservation term for the rollover at least as long as that offered by any competing customer.
XX.X xx.xx ROLLOVER RIGHTS FOR LONG TERM FIRM SERVICE Overview Pursuant to Section 2.2 of the Tariff, long term firm transmission service customers (Network Integration Transmission Service ( NITS ) customers
More informationControl Design of an Automated Highway System (Roberto Horowitz and Pravin Varaiya) Presentation: Erik Wernholt
Control Design of an Automated Highway System (Roberto Horowitz and Pravin Varaiya) Presentation: Erik Wernholt 2001-05-11 1 Contents Introduction What is an AHS? Why use an AHS? System architecture Layers
More informationindex changing a variable s value, Chime My Block, clearing the screen. See Display block CoastBack program, 54 44
index A absolute value, 103, 159 adding labels to a displayed value, 108 109 adding a Sequence Beam to a Loop of Switch block, 223 228 algorithm, defined, 86 ambient light, measuring, 63 analyzing data,
More informationWhite Paper Nest Learning Thermostat Efficiency Simulation for the U.K. Nest Labs April 2014
White Paper Nest Learning Thermostat Efficiency Simulation for the U.K. Nest Labs April 2014 Introduction This white paper gives an overview of potential energy savings using the Nest Learning Thermostat
More informationScheduling for Wireless Energy Sharing Among Electric Vehicles
Scheduling for Wireless Energy Sharing Among Electric Vehicles Zhichuan Huang Computer Science and Electrical Engineering University of Maryland, Baltimore County Ting Zhu Computer Science and Electrical
More informationAutonomous taxicabs in Berlin a spatiotemporal analysis of service performance. Joschka Bischoff, M.Sc. Dr.-Ing. Michal Maciejewski
Autonomous taxicabs in Berlin a spatiotemporal analysis of service performance Joschka Bischoff, M.Sc. Dr.-Ing. Michal Maciejewski Mobil.TUM 2016, 7 June 2016 Contents Motivation Methodology Results Conclusion
More informationFLYING CAR NANODEGREE SYLLABUS
FLYING CAR NANODEGREE SYLLABUS Term 1: Aerial Robotics 2 Course 1: Introduction 2 Course 2: Planning 2 Course 3: Control 3 Course 4: Estimation 3 Term 2: Intelligent Air Systems 4 Course 5: Flying Cars
More informationTurku Raitiotie plan Superbus charging system simulations
VTT TECHNICAL RESEARCH CENTRE OF FINLAND LTD Turku Raitiotie plan Superbus charging system simulations Joel Anttila VTT joel.anttila@vtt.fi Overview Charging system requirements for battery electric doublearticulated
More informationA Chemical Batch Reactor Schedule Optimizer
A Chemical Batch Reactor Schedule Optimizer By Steve Morrison, Ph.D. 1997 Info@MethodicalMiracles.com 214-769-9081 Many chemical plants have a very similar configuration to pulp batch digesters; two examples
More informationVHDL (and verilog) allow complex hardware to be described in either single-segment style to two-segment style
FFs and Registers In this lecture, we show how the process block is used to create FFs and registers Flip-flops (FFs) and registers are both derived using our standard data types, std_logic, std_logic_vector,
More informationINSTALLATION INSTRUCTIONS FOR SYMCOM'S MODEL 777-HVR-SP ELECTRONIC OVERLOAD RELAY
CONNECTIONS INSTALLATION INSTRUCTIONS FOR SYMCOM'S MODEL 777-HVR-SP ELECTRONIC OVERLOAD RELAY BE SURE POWER IS DISCONNECTED PRIOR TO INSTALLATION!! FOLLOW NATIONAL, STATE AND LOCAL CODES! READ THESE INSTRUCTIONS
More informationComputer Architecture and Parallel Computing 并行结构与计算. Lecture 5 SuperScalar and Multithreading. Peng Liu
Comuter Architecture and Parallel Comuting 并行结构与计算 Lecture 5 SuerScalar and Multithreading Peng Liu College of Info. Sci. & Elec. Eng. Zhejiang University liueng@zju.edu.cn Last time in Lecture 04 Register
More informationCPW Current Programmed Winder for the 890. Application Handbook. Copyright 2005 by Parker SSD Drives, Inc.
CPW Current Programmed Winder for the 890. Application Handbook Copyright 2005 by Parker SSD Drives, Inc. All rights strictly reserved. No part of this document may be stored in a retrieval system, or
More informationManaging Projects Teaching materials to accompany:
Managing Projects Teaching materials to accompany: Product Design and Development Chapter 14 Karl T. Ulrich and Steven D. Eppinger 2nd Edition, Irwin McGraw-Hill, 2000. Product Development Process Planning
More informationтел.: +375(1771) e mail: Fuel level sensors eurosens Dominator
тел.: +375(1771)7 13 00 e mail: office@mechatronics.by Fuel level sensors eurosens Dominator Application The devices are used in vehicle tanks and stationary capacities for fuel level measurement. The
More informationQUICK INSTALLATION GUIDE
MANUAL/AUTOMATIC T R A N S M I S S I O N 2 - B U T T O N R E M O T E S T A R T E R W I T H V I R T U A L T A C H S Y S T E M ( A S P R G - 1 0 0 0 C O M P A T I B L E ) QUICK INSTALLATION GUIDE Manual
More informationAC drive has detected too high a Check loading
Fault code Fault Name Fault type Default Possible Cause Remedy 1 Over Current Fault AC drive has detected too high a Check loading current (>4*IH) in the motor cable: Check motor Sudden heavy load increase
More information1.2 Flipping Ferraris
1.2 Flipping Ferraris A Solidify Understanding Task When people first learn to drive, they are often told that the faster they are driving, the longer it will take to stop. So, when you re driving on the
More informationIntegrated System Models Graph Trace Analysis Distributed Engineering Workstation
Integrated System Models Graph Trace Analysis Distributed Engineering Workstation Robert Broadwater dew@edd-us.com 1 Model Based Intelligence 2 Integrated System Models Merge many existing, models together,
More information:34 1/15 Hub-4 / grid parallel - manual
2016-02-24 11:34 1/15 Hub-4 / grid parallel - manual Hub-4 / grid parallel - manual Note: make sure to always update all components to the latest software when making a new installation. Introduction Hub-4
More informationCode Generation Part III
1 Code Generation Part III Chapters 8 and 9.1 (1 st ed. Ch.9) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007-2013 2 Classic Examples of Local and Global Code
More information2 MEETING THE CHALLENGE OF A DIFFICULT JOB SPECIALTY CONTRACTOR
Category: 2 MEETING THE CHALLENGE OF A DIFFICULT JOB SPECIALTY CONTRACTOR Specialty Contractor: RK STEEL Project Name: MILE HIGH HARLEY-DAVIDSON OF PARKER CANOPY ICONIC QUALITY: CUSTOMIZE YOUR MOTORCYCLE
More informationRegisters Shift Registers Accumulators Register Files Register Transfer Language. Chapter 8 Registers. SKEE2263 Digital Systems
Chapter 8 Registers SKEE2263 igital Systems Mun im Zabidi {munim@utm.my} Ismahani Ismail {ismahani@fke.utm.my} Izam Kamisian {e-izam@utm.my} Faculty of Electrical Engineering, Universiti Teknologi Malaysia
More informationIntroduction to Digital Techniques
to Digital Techniques Dan I. Porat, Ph.D. Stanford Linear Accelerator Center Stanford University, California Arpad Barna, Ph.D. Hewlett-Packard Laboratories Palo Alto, California John Wiley and Sons New
More informationBattery Technology for Data Centers and Network Rooms: Site Planning
Battery Technology for Data Centers and Network Rooms: Site Planning White Paper # 33 Executive Summary The site requirements and costs for protecting information technology and network environments are
More information