Announcements. Programming assignment #2 due Monday 9/24. Talk: Architectural Acceleration of Real Time Physics Glenn Reinman, UCLA CS
|
|
- Vivien Franklin
- 6 years ago
- Views:
Transcription
1 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar GAS STATION Pipelining II Fall 2007 Prof. Thomas Wenisch Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie ellon niversity, Purdue niversity, niversity of ichigan, and niversity of Wisconsin. Slide
2 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Announcements HW # 2 due Wednesday 9/26 Programming assignment #2 due onday 9/24 Talk: Architectural Acceleration of Real Time Physics Glenn Reinman, CLA CS on., Sep :5 3:5, CSE 3725 Office hour on Wed. 9/9 moved to Tue, 9/8 2 3 Slide 2
3 Readings Wenisch Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar For Wednesday: H & P Chapter A.7 A.8, 2., Slide 3
4 Outline: Wenisch Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar nderstanding di the Execution Core. 370 s 5 stage pipeline (review) 2. Implementing pipeline interlocks (review) 3. Scoreboard scheduling (CDC 6600) 4. Tomasulo s s OoO schedulingalgorithm (IB360) 5. Precise interrupts with a Reorder Buffer (P6) 6. odern OoO (IPS R0K, Netburst) Slide 4
5 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Handling Data Hazards Avoidance (static) ake sure there are no hazards in the code Detect and Stall (dynamic) Stall until earlier instructions finish Detect and Forward (dynamic) Get correct value from elsewhere in pipeline Slide 5
6 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Handling Data Hazards: Detect t & Forward Detection Same as detect and stall, but each possible hazard requires different forwarding paths Forward Add data paths for all possible sources Add mux in front of AL to select source bypassing logic often a critical path in wide issue machines # paths grows quadratically with machine width Slide 6
7 add 2 3 // r3 = r Lipasti, r2artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar nand // r5 = r3 NAND r4 add // r7 = r3 r6 lw // r6 = E[r30] sw // E[r62]=r2 r add nand add lw sw Slide 7
8 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar First half of cycle 3 PC Inst mem 2 na and Hazard 3 3 rega regb data Re egister file R0 R R2 R3 R4 R5 R6 R A L Data memory add IF/ ID fwd fwd fwd ID/ E E/ em em/ WB Slide 8
9 End of cycle 3 Wenisch Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem 3 a dd rega regb 3 data Re egister file R0 R R2 R3 R4 R5 R6 R A L 2 Data memory nand add IF/ ID H ID/ E E/ em em/ WB Slide 9
10 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar First half of cycle 4 PC Inst mem 3 add New Hazard R R rega R2 3 regb R3 0 R4 3 R5 data R6 5 Re egister file R7 2 2 A L 2 Data memory nand add IF/ ID H ID/ E E/ em em/ WB Slide 0
11 End of cycle 4 Wenisch Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem 4 lw rega regb data Re egister file R0 R R2 R3 R4 R5 R6 R A L -2 Data memory 2 add nand add IF/ ID H2 ID/ E H E/ em em/ WB Slide
12 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar First half of cycle 5 PC Inst mem 4 lw No Hazard 3 rega regb data Re egister file R0 R R2 R3 R4 R5 R6 R A L -2 Data memory 2 add nand add IF/ ID H2 ID/ E H E/ em em/ WB Slide 2
13 End of cycle 5 Wenisch Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem 5 sw rega regb 67 5 data Re egister file R0 R R2 R3 R4 R5 R6 R A L 22 Data memory -2 lw add nand IF/ ID ID/ E H2 E/ em H em/ WB Slide 3
14 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar First half of cycle 6 en en PC Inst mem 5 sw Hazard 6 rega regb L data Re egister file R0 R R2 R3 R4 R5 R6 R A L 22 Data memory -2 lw add nand IF/ ID ID/ E H2 E/ em H em/ WB Slide 4
15 End of cycle 6 Wenisch Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem 5 sw rega regb 6 7 data Re egister file R0 R R2 R3 R4 R5 R6 R A L 3 Data memory 22 noop lw add IF/ ID ID/ E E/ em H2 em/ WB Slide 5
16 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar First half of cycle 7 PC Inst mem 5 sw Hazard 6 rega regb 6 7 data Re egister file R0 R R2 R3 R4 R5 R6 R A L 3 Data memory 22 noop lw add IF/ ID ID/ E E/ em H2 em/ WB Slide 6
17 End of cycle 7 Wenisch Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem rega regb 6 data Re egister file R0 R R2 R3 R4 R5 R6 R A L Data memory 99 sw noop lw IF/ ID H3 ID/ E E/ em em/ WB Slide 7
18 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar First half of cycle 8 PC Inst mem rega regb 6 data Re egister file R0 R R R3 R4 R5 7 R6 R7 2 2 A L Data memory 99 sw noop lw IF/ ID H3 ID/ E E/ em em/ WB Slide 8
19 End of cycle 8 Wenisch Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem rega regb data Re egister file R0 R R2 R3 R4 R5 R6 R A L Data memory 7 sw noop IF/ ID ID/ E H3 E/ em em/ WB Slide 9
20 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Load Delay Slot (IPS R2000) t 0 t t 2 t 3 t 4 t 5 i: F D E W j: F D E W F D E W k: h: R k -- i: R k E[ - ] - The effect of a delayed Load is not visible to the instructions in its delay slots. j: -- R k Which (R k: -- R k ) do we really mean? k Slide 20
21 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Control Hazards beq 0 sub beq sub t 0 t t 2 t 3 t 4 t 5 F D E W F D E W squash Slide 2
22 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Handling Data Hazards Avoidance (static) No branches? Convert branches to predication Control dependence becomes data dependence Detect andstall (dynamic) Stop fetch until branch resolves Speculate and squash (dynamic) Keep going past branch, throw away instructions if wrong Slide 22
23 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Avoidance: if-conversion if (a == b) { sub t ab a, x; jnz t, PC2 y = n / d; add x x, # } div y n, d sub t a, b sub t a, b add(t) x x, # add t2 x, # div(t) y n, d div t3 n, d cmov(t) x t2 cmov(t) y t3 Slide 23
24 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Handling Control Hazards: Detect t & Stall Detection In decode, check if opcode is branch or jump Stall Hold next instruction in Fetch Pass noop to Decode Slide 24
25 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Problems with Detect & Stall CPI increases on every branch Are these stalls necessary? Not always! Branch is only taken half the time Assume branch is NOT taken Keep fetching, treat branch as noop If wrong, make sure bad instructions don t complete Slide 25
26 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Handling Control Hazards: Speculate & Squash Speculate Assume branch is not taken Squash Overwrite opcodes in Fetch, Decode, Execute with noop Pass target to Fetch Slide 26
27 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC beq sub add nand Inst mem noop add IF/ ID Control REG file sign ext noop sub ID/ E equal A L noop beq E/ em Data memory beq em/ WB Slide 27
28 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Problems with Speculate & Squash Alwaysassumes assumes branch is not taken Can we do better? Yes. Predict branch direction and target! Why possible? Program behavior repeats. ore on branch prediction to come... Slide 28
29 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Branch Delay Slot (IPS, SPARC) branch: next: target: t 0 t t 2 t 3 t 4 t 5 F D E W F Squash F D E W - Instruction in delay slot executes even on taken branch branch: delay: target: F D E W F D E W F D E W i: beq, 2, tgt j: add 3, 4, 5 What can we put here? Slide 29
30 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Pipeline Hazard Checklist emory Data Dependences Output Dependence (WAW) Anti Dependence (WAR) True Data Dependence (RAW) Register Data Dependences Output Dependence (WAW) Anti Dependence (WAR) True Data Dependence (RAW) Control Dependences Slide 30
31 Sequential Code Semantics Wenisch Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Instruction Dependences i: xxxx i i2: xxxx i2 i3: xxxx i3 and Pipeline Hazards A.7-A.8, 2., A true dependence between two instructions may only involve one substep of each instruction. i: i2: The implied sequential precedences are overspecifications. It is sufficient but not necessary to ensure program correctness. i3: Slide 3
32 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Limitations of Scalar Pipelines pper Bound on Scalar Pipeline Throughput Limited by IPC= Flynn Bottleneck Inefficient nificationinto Single Pipeline Long latency for each instruction Performance Lost Due to Rigid In order Pipeline nnecessary stalls Slide 32
33 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Architectures for Instruction-Level ti Parallelism li Slide 33
34 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Superscalar achine Slide 34
35 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar What is the real problem? CPI of inorder pipelines degrades very sharply if the machine parallism is increased beyond a certain point, i.e., when Nx approaches average distance between dependent instructions Forwarding is no longer effective Pipeline may never be full due to frequent dependency stalls! Slide 35
36 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar ILP: Instruction-Level Parallelism ILP is is a measure of the amount of inter dependencies between instructions Average ILP= no. instruction / no. cyc required code: ILP = code2: ILP = 3 i.e. must execute serially i.e. can execute at the same time code: r r2 r3 r / 7 r4 r0 - r3 code2: r r2 r3 r9 / 7 r4 r0 - r0 Slide 36
37 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Purported Limits on ILP Weiss and Smith [984].58 Sohi and Vajapeyam [987].8 Tjaden and Flynn [970] Tjaden and Flynn [973].96 ht [986] 2.00 Smith et al. [989] 2.00 Jouppi and Wall [988] 2.40 Johnson [99] 2.50 Acosta et al. [986] 2.79 Wedig [982] 3.00 Butler et al. [99] 5.8 elvin and Patt [99] 6 Wall [99] 7 Kuck et al. [972] 8 Riseman and Foster [972] 5 Nicolau and Fisher [984] 90 Slide 37
38 Scope of ILP Analysis Wenisch Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar ILP= r r2 r3 r / 7 r4 r0 -r3 r r2 r3 r9 / 7 r4 r0 - r20 ILP=2 Out-of-order execution permits more ILP to be exploited Slide 38
39 Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar How Large ust the Window Be? Slide 39
40 Outline: Wenisch Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar nderstanding di the Execution Core. 370 s 5 stage pipeline (review) 2. Implementing pipeline interlocks (review) 3. Scoreboard scheduling (CDC 6600) 4. Tomasulo s s OoO schedulingalgorithm (IB360) 5. Precise interrupts with a Reorder Buffer (P6) 6. odern OoO (IPS R0K, Netburst) Slide 40
Parallelism I: Inside the Core
Parallelism I: Inside the Core 1 The final Comprehensive Same general format as the Midterm. Review the homeworks, the slides, and the quizzes. 2 Key Points What is wide issue mean? How does does it affect
More informationComputer Architecture 计算机体系结构. Lecture 3. Instruction-Level Parallelism I 第三讲 指令级并行 I. Chao Li, PhD. 李超博士
Computer Architecture 计算机体系结构 Lecture 3. Instruction-Level Parallelism I 第三讲 指令级并行 I Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2018 Review ISA, micro-architecture, physical design Evolution of ISA CISC vs
More informationLecture 14: Instruction Level Parallelism
Lecture 14: Instruction Level Parallelism Last time Pipelining in the real world Today Control hazards Other pipelines Take QUIZ 10 over P&H 4.10-15, before 11:59pm today Homework 5 due Thursday March
More informationLecture 20: Parallelism ILP to Multicores. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 20: Parallelism ILP to Multicores James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L20 S1, James C. Hoe, CMU/ECE/CALCM, 2018 18 447 S18 L20 S2, James C. Hoe, CMU/ECE/CALCM,
More informationAdvanced Superscalar Architectures. Speculative and Out-of-Order Execution
6.823, L16--1 Advanced Superscalar Architectures Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Speculative and Out-of-Order Execution Branch Prediction kill kill Branch
More informationPipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold
Pipelining Readings: 4.5-4.8 Example: Doing the laundry Ann, Brian, Cathy, & Dave A B C D each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes Folder takes
More informationAnne Bracy CS 3410 Computer Science Cornell University. [K. Bala, A. Bracy, S. McKee, E. Sirer, H. Weatherspoon]
Anne Bracy CS 3410 Computer Science Cornell University [K. Bala, A. Bracy, S. McKee, E. Sirer, H. Weatherspoon] Prog. Mem PC +4 inst Reg. File 5 5 5 control ALU Data Mem Fetch Decode Execute Memory WB
More informationCIS 371 Computer Organization and Design
CIS 371 Computer Organization and Design Unit 10: Static & Dynamic Scheduling Slides developed by Milo Martin & Amir Roth at the University of Pennsylvania with sources that included University of Wisconsin
More informationCMU Introduction to Computer Architecture, Spring 2013 HW 3 Solutions: Microprogramming Wrap-up and Pipelining
CMU 18-447 Introduction to Computer Architecture, Spring 2013 HW 3 Solutions: Microprogramming Wrap-up and Pipelining Instructor: Prof. Onur Mutlu TAs: Justin Meza, Yoongu Kim, Jason Lin 1 Adding the REP
More informationPIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS
PIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 1 submission
More informationOut-of-order Pipeline. Register Read. OOO execution (2-wide) OOO execution (2-wide) OOO execution (2-wide) OOO execution (2-wide)
Out-of-order Pipeline Register Read When do instructions read the register file? Fetch Decode Rename Dispatch Buffer of instructions Issue Reg-read Execute Writeback Commit Option #: after select, right
More informationTomasulo-Style Register Renaming
Tomasulo-Style Register Renaming ldf f0,x(r1) allocate RS#4 map f0 to RS#4 mulf f4,f0, allocate RS#6 ready, copy value f0 not ready, copy tag Map Table f0 f4 RS#4 RS T V1 V2 T1 T2 4 REG[r1] 6 REG[] RS#4
More informationCOSC 6385 Computer Architecture. - Tomasulos Algorithm
COSC 6385 Computer Architecture - Tomasulos Algorithm Fall 2008 Analyzing a short code-sequence DIV.D F0, F2, F4 ADD.D F6, F0, F8 S.D F6, 0(R1) SUB.D F8, F10, F14 MUL.D F6, F10, F8 1 Analyzing a short
More informationComputer Architecture: Out-of-Order Execution. Prof. Onur Mutlu (editted by Seth) Carnegie Mellon University
Computer Architecture: Out-of-Order Execution Prof. Onur Mutlu (editted by Seth) Carnegie Mellon University Reading for Today Smith and Sohi, The Microarchitecture of Superscalar Processors, Proceedings
More informationUnit 9: Static & Dynamic Scheduling
CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed by Drew Hilton, Amir Roth and Milo Mar;n at University of Pennsylvania CIS 501: Comp. Arch. Prof. Milo Martin
More informationPipeline Hazards. See P&H Chapter 4.7. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University
Pipeline Hazards See P&H Chapter 4.7 Hakim Weatherspoon CS 341, Spring 213 Computer Science Cornell niversity Goals for Today Data Hazards Revisit Pipelined Processors Data dependencies Problem, detection,
More informationCIS 371 Computer Organization and Design
CIS 371 Computer Organization and Design Unit 10: Static & Dynamic Scheduling Slides developed by M. Martin, A.Roth, C.J. Taylor and Benedict Brown at the University of Pennsylvania with sources that included
More informationPipeline Hazards. See P&H Chapter 4.7. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University
Pipeline Hazards See P&H Chapter 4.7 Hakim Weatherspoon CS 341, Spring 213 Computer Science Cornell niversity Goals for Today Data Hazards Revisit Pipelined Processors Data dependencies Problem, detection,
More informationHakim Weatherspoon CS 3410 Computer Science Cornell University
Hakim Weatherspoon CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer. memory inst register
More informationPipelined MIPS Datapath with Control Signals
uction ess uction Rs [:26] (Opcode[5:]) [5:] ranch luor. Decoder Pipelined MIPS path with Signals luor Raddr at Five instruction sequence to be processed by pipeline: op [:26] rs [25:2] rt [2:6] rd [5:]
More informationImproving Performance: Pipelining!
Iproving Perforance: Pipelining! Meory General registers Meory ID EXE MEM WB Instruction Fetch (includes PC increent) ID Instruction Decode + fetching values fro general purpose registers EXE EXEcute arithetic/logic
More informationCS152: Computer Architecture and Engineering Introduction to Pipelining. October 22, 1997 Dave Patterson (http.cs.berkeley.
CS152: Computer Architecture and Engineering Introduction to Pipelining October 22, 1997 Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/ cs 152
More informationDAT105: Computer Architecture Study Period 2, 2009 Exercise 2 Chapter 2: Instruction-Level Parallelism and Its Exploitation
Study Period 2, 29 Exercise 2 Chapter 2: Instruction-Level Parallelism and Its Exploitation Mafijul Islam Department of Computer Science and Engineering November 12, 29 Study Period 2, 29 Goals: To understand
More informationTo read more. CS 6354: Tomasulo. Intel Skylake. Scheduling. How can we reorder instructions? Without changing the answer.
To read more CS 6354: Tomasulo 21 September 2016 This day s paper: Tomasulo, An Efficient Algorithm for Exploiting Multiple Arithmetic Units Supplementary readings: Hennessy and Patterson, Computer Architecture:
More informationCS 6354: Tomasulo. 21 September 2016
1 CS 6354: Tomasulo 21 September 2016 To read more 1 This day s paper: Tomasulo, An Efficient Algorithm for Exploiting Multiple Arithmetic Units Supplementary readings: Hennessy and Patterson, Computer
More informationCode Scheduling & Limitations
This Unit: Static & Dynamic Scheduling CIS 371 Computer Organization and Design Unit 11: Static and Dynamic Scheduling App App App System software Mem CPU I/O Code scheduling To reduce pipeline stalls
More informationChapter 3: Computer Organization Fundamentals. Oregon State University School of Electrical Engineering and Computer Science.
Chapter 3: Computer Organization Fundamentals Prof. Ben Lee Oregon State University School of Electrical Engineering and Computer Science Chapter Goals Understand the organization of a computer system
More informationComputer Architecture ELE 475 / COS 475 Slide Deck 6: Superscalar 3. David Wentzlaff Department of Electrical Engineering Princeton University
Computer Architecture ELE 475 / COS 475 Slide Deck 6: Superscalar 3 David Wentzlaff Department of Electrical Engineering Princeton University 1 Agenda SpeculaJon and Branches Register Renaming Memory DisambiguaJon
More informationAdvanced Superscalar Architectures
Advanced Suerscalar Architectures Krste Asanovic Laboratory for Comuter Science Massachusetts Institute of Technology Physical Register Renaming (single hysical register file: MIPS R10K, Alha 21264, Pentium-4)
More informationOptimality of Tomasulo s Algorithm Luna, Dong Gang, Zhao
Optimality of Tomasulo s Algorithm Luna, Dong Gang, Zhao Feb 28th, 2002 Our Questions about Tomasulo Questions about Tomasulo s Algorithm Is it optimal (can always produce the wisest instruction execution
More informationCS 152 Computer Architecture and Engineering. Lecture 14 - Advanced Superscalars
CS 152 Comuter Architecture and Engineering Lecture 14 - Advanced Suerscalars Krste Asanovic Electrical Engineering and Comuter Sciences University of California at Berkeley htt://www.eecs.berkeley.edu/~krste
More informationENGN1640: Design of Computing Systems Topic 05: Pipeline Processor Design
ENGN64: Design of Computing Systems Topic 5: Pipeline Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University
More informationCS 152 Computer Architecture and Engineering. Lecture 15 - Advanced Superscalars
CS 152 Comuter Architecture and Engineering Lecture 15 - Advanced Suerscalars Krste Asanovic Electrical Engineering and Comuter Sciences University of California at Berkeley htt://www.eecs.berkeley.edu/~krste
More informationCIS 662: Sample midterm w solutions
CIS 662: Sample midterm w solutions 1. (40 points) A processor has the following stages in its pipeline: IF ID ALU1 MEM1 MEM2 ALU2 WB. ALU1 stage is used for effective address calculation for loads, stores
More information6.823 Computer System Architecture Prerequisite Self-Assessment Test Assigned Feb. 6, 2019 Due Feb 11, 2019
6.823 Computer System Architecture Prerequisite Self-Assessment Test Assigned Feb. 6, 2019 Due Feb 11, 2019 http://csg.csail.mit.edu/6.823/ This self-assessment test is intended to help you determine your
More informationECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 10 Instruction-Level Parallelism Part 3
ECE 552 / CPS 550 Advanced Comuter Architecture I Lecture 10 Instruction-Level Parallelism Part 3 Benjamin Lee Electrical and Comuter Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 23 Synchronization 2006-11-16 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ 1 Last Time:
More informationCMPEN 411 VLSI Digital Circuits Spring Lecture 24: Peripheral Memory Circuits
CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 24: Peripheral Memory Circuits [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp12
More informationEECS 583 Class 9 Classic Optimization
EECS 583 Class 9 Classic Optimization University of Michigan September 28, 2016 Generalizing Dataflow Analysis Transfer function» How information is changed by something (BB)» OUT = GEN + (IN KILL) /*
More informationFunctional Algorithm for Automated Pedestrian Collision Avoidance System
Functional Algorithm for Automated Pedestrian Collision Avoidance System Customer: Mr. David Agnew, Director Advanced Engineering of Mobis NA Sep 2016 Overview of Need: Autonomous or Highly Automated driving
More informationM2 Instruction Set Architecture
M2 Instruction Set Architecture Module Outline Addressing modes. Instruction classes. MIPS-I ISA. High level languages, Assembly languages and object code. Translating and starting a program. Subroutine
More informationCS 250! VLSI System Design
CS 250! VLSI System Design Lecture 3 Timing 2014-9-4! Professor Jonathan Bachrach! slides by John Lazzaro TA: Colin Schmidt www-insteecsberkeleyedu/~cs250/ UC Regents Fall 2013/1014 UCB everything doesn
More informationAnalyzing Feature Interactions in Automobiles. John Thomas, Ph.D. Seth Placke
Analyzing Feature Interactions in Automobiles John Thomas, Ph.D. Seth Placke 3.25.14 Outline Project Introduction & Background STPA Case Study New Strategy for Analyzing Interactions Contributions Project
More informationFixing the Hyperdrive: Maximizing Rendering Performance on NVIDIA GPUs
Fixing the Hyperdrive: Maximizing Rendering Performance on NVIDIA GPUs Louis Bavoil, Principal Engineer Booth #223 - South Hall www.nvidia.com/gdc Full-Screen Pixel Shader SM TEX L2 DRAM CROP SM = Streaming
More informationDual-Rail Domino Logic Circuits with PVT Variations in VDSM Technology
Dual-Rail Domino Logic Circuits with PVT Variations in VDSM Technology C. H. Balaji 1, E. V. Kishore 2, A. Ramakrishna 3 1 Student, Electronics and Communication Engineering, K L University, Vijayawada,
More informationFabComp: Hardware specication
Sol Boucher and Evan Klei CSCI-453-01 04/28/14 FabComp: Hardware specication 1 Hardware The computer is composed of a largely isolated data unit and control unit, which are only connected by a couple of
More informationChapter 11. Using MAX II User Flash Memory for Data Storage in Manufacturing Flow
Chapter 11. Using MAX II User Flash Memory for Data Storage in Manufacturing Flow MII51011-1.0 Introduction Small capacity, non-volatile memory is commonly used in storing manufacturing data (e.g., manufacturer
More informationWarped-Compression: Enabling Power Efficient GPUs through Register Compression
WarpedCompression: Enabling Power Efficient GPUs through Register Compression Sangpil Lee, Keunsoo Kim, Won Woo Ro (Yonsei University*) Gunjae Koo, Hyeran Jeon, Murali Annavaram (USC) (*Work done while
More informationSetup Manual for Propeller H60A series
Setup Manual for Propeller H60A series Propeller type: Serial No.: Date of sale: Stamp, Signature: Table of Contents 1 Description... 4 2 Requirements... 4 3 Preparations... 4 4 Data Output... 4 5 Functions
More informationSTPA based Method to Identify and Control Software Feature Interactions. John Thomas Dajiang Suo
STPA based Method to Identify and Control Software Feature Interactions John Thomas Dajiang Suo Quote The hardest single part of building a software system is deciding precisely what to build. -- Fred
More informationComputer Architecture and Parallel Computing 并行结构与计算. Lecture 5 SuperScalar and Multithreading. Peng Liu
Comuter Architecture and Parallel Comuting 并行结构与计算 Lecture 5 SuerScalar and Multithreading Peng Liu College of Info. Sci. & Elec. Eng. Zhejiang University liueng@zju.edu.cn Last time in Lecture 04 Register
More informationCMPEN 411 VLSI Digital Circuits Spring Lecture 20: Multiplier Design
CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 20: Multiplier Design [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN 411
More informationNetLogo and Multi-Agent Simulation (in Introductory Computer Science)
NetLogo and Multi-Agent Simulation (in Introductory Computer Science) Matthew Dickerson Middlebury College, Vermont dickerso@middlebury.edu Supported by the National Science Foundation DUE-1044806 http://ccl.northwestern.edu/netlogo/
More informationECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017
ECE 550D Fundamentals of Computer Systems and Engineering Fall 2017 Digital Arithmetic Prof. John Board Duke University Slides are derived from work by Profs. Tyler Bletch and Andrew Hilton (Duke) Last
More informationTechniques, October , Boston, USA. Personal use of this material is permitted. However, permission to
Copyright 1996 IEEE. Published in the Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques, October 21-23 1996, Boston, USA. Personal use of this material is permitted.
More information- DQ0 - NC DQ1 - NC - NC DQ0 - NC DQ2 DQ1 DQ CONFIGURATION. None SPEED GRADE
SYNCHRONOUS DRAM 52Mb: x4, x8, x6 MT48LC28M4A2 32 MEG x 4 x 4 S MT48LC64M8A2 6 MEG x 8 x 4 S MT48LC32M6A2 8 MEG x 6 x 4 S For the latest data sheet, please refer to the Micron Web site: www.micron.com/dramds
More informationSYNCHRONOUS DRAM. 128Mb: x32 SDRAM. MT48LC4M32B2-1 Meg x 32 x 4 banks
SYNCHRONOUS DRAM 128Mb: x32 MT48LC4M32B2-1 Meg x 32 x 4 banks For the latest data sheet, please refer to the Micron Web site: www.micron.com/sdramds FEATURES PC100 functionality Fully synchronous; all
More informationImproving Memory System Performance with Energy-Efficient Value Speculation
Improving Memory System Performance with Energy-Efficient Value Speculation Nana B. Sam and Min Burtscher Computer Systems Laboratory Cornell University Ithaca, NY 14853 {besema, burtscher}@csl.cornell.edu
More informationIn-Place Associative Computing:
In-Place Associative Computing: A New Concept in Processor Design 1 Page Abstract 3 What s Wrong with Existing Processors? 3 Introducing the Associative Processing Unit 5 The APU Edge 5 Overview of APU
More informationCSCI 510: Computer Architecture Written Assignment 2 Solutions
CSCI 510: Computer Architecture Written Assignment 2 Solutions The following code does compution over two vectors. Consider different execution scenarios and provide the average number of cycles per iterion
More informatione-smart 2009 Low cost fault injection method for security characterization
e-smart 2009 Low cost fault injection method for security characterization Jean-Max Dutertre ENSMSE Assia Tria CEA-LETI Bruno Robisson CEA-LETI Michel Agoyan CEA-LETI Département SAS Équipe mixte CEA-LETI/ENSMSE
More informationNear-Optimal Precharging in High-Performance Nanoscale CMOS Caches
Near-Optimal Precharging in High-Performance Nanoscale CMOS Caches Se-Hyun Yang and Babak Falsafi Computer Architecture Laboratory (CALCM) Carnegie Mellon University {sehyun, babak}@cmu.edu http://www.ece.cmu.edu/~powertap
More informationVHDL (and verilog) allow complex hardware to be described in either single-segment style to two-segment style
FFs and Registers In this lecture, we show how the process block is used to create FFs and registers Flip-flops (FFs) and registers are both derived using our standard data types, std_logic, std_logic_vector,
More informationTRIPS AND FAULT FINDING
WWW.SDS.LTD.UK 0117 9381800 Trips and Fault Finding Chapter 6 6-1 TRIPS AND FAULT FINDING Trips What Happens when a Trip Occurs When a trip occurs, the drive s power stage is immediately disabled causing
More informationGood Winding Starts the First 5 Seconds Part 2 Drives Clarence Klassen, P.Eng.
Good Winding Starts the First 5 Seconds Part 2 Drives Clarence Klassen, P.Eng. Abstract: This is the second part of the "Good Winding Starts" presentation. Here we discuss the drive system and its requirements
More informationARC-H: Adaptive replacement cache management for heterogeneous storage devices
Journal of Systems Architecture 58 (2012) ARC-H: Adaptive replacement cache management for heterogeneous storage devices Young-Jin Kim, Division of Electrical and Computer Engineering, Ajou University,
More informationExploiting Clock Skew Scheduling for FPGA
Exploiting Clock Skew Scheduling for FPGA Sungmin Bae, Prasanth Mangalagiri, N. Vijaykrishnan Email {sbae, mangalag, vijay}@cse.psu.edu CSE Department, Pennsylvania State University, University Park, PA
More informationScheduling. Purpose of scheduling. Scheduling. Scheduling. Concurrent & Distributed Systems Purpose of scheduling.
427 Concurrent & Distributed Systems 2017 6 Uwe R. Zimmer - The Australian National University 429 Motivation and definition of terms Purpose of scheduling 2017 Uwe R. Zimmer, The Australian National University
More informationTest Infrastructure Design for Core-Based System-on-Chip Under Cycle-Accurate Thermal Constraints
Test Infrastructure Design for Core-Based System-on-Chip Under Cycle-Accurate Thermal Constraints Thomas Edison Yu, Tomokazu Yoneda, Krishnendu Chakrabarty and Hideo Fujiwara Nara Institute of Science
More information1 Descriptions of Use Case
Plug-in Electric Vehicle Diagnostics 1 Descriptions of Use Case The utility and the vehicle are actors in this use case related to diagnostics. The diagnostics cover the end-to-end communication system
More informationManaging Projects Teaching materials to accompany:
Managing Projects Teaching materials to accompany: Product Design and Development Chapter 14 Karl T. Ulrich and Steven D. Eppinger 2nd Edition, Irwin McGraw-Hill, 2000. Product Development Process Planning
More informationVehicle Rotation Planning for Intercity Railways
Vehicle Rotation Planning for Intercity Railways Markus Reuther ** Joint work with Ralf Borndörfer, Thomas Schlechte and Steffen Weider Zuse Institute Berlin May 24, 2011 Markus Reuther (Zuse Institute
More informationStandby Power Systems
Source: Power Quality in Electrical Systems Chapter 13 Standby Power Systems The term standby power systems describes the equipment interposed between the utility power source and the electrical load to
More informationTransient Stability Analysis Tool (TSAT) Update April 11, Hongming Zhang EMS Network Applications Manager
Transient Stability Analysis Tool (TSAT) Update April 11, 2016 Hongming Zhang EMS Network Applications Manager Agenda Peak Online TSAT Introduction: Where we are on Wide-wide System Model (WSM) -TSAT online
More informationHow to run a static aeroelastic analysis three maneuver sizing. Version 2.2(.790)
NeoCASSTt Tutorial il How to run a static aeroelastic analysis three maneuver sizing Version 2.2(.790) August 2017 Outline 1. Maneuvers Set Definition pag. 3 2. Dimensioning Maneuvers pag. pg 13 3. How
More informationUC Berkeley CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 20 Synchronous Digital Systems Blu-ray vs HD-DVD war over? As you know, there are two different, competing formats for the next
More informationSDRAM AS4SD8M Mb: 8 Meg x 16 SDRAM Synchronous DRAM Memory. PIN ASSIGNMENT (Top View)
128 Mb: 8 Meg x 16 SDRAM Synchronous DRAM Memory FEATURES Full Military temp (-55 C to 125 C) processing available Configuration: 8 Meg x 16 (2 Meg x 16 x 4 banks) Fully synchronous; all signals registered
More informationEminox Electronic Service Indicator
ESI Installation and Instruction Manual Eminox Electronic Service Indicator I n s t a l l a t i o n a n d I n s t r u c t i o n M a n u a l LITM004 ESI Installation and Instruction Manual Section Title
More informationBASIC MECHATRONICS ENGINEERING
MBEYA UNIVERSITY OF SCIENCE AND TECHNOLOGY Lecture Summary on BASIC MECHATRONICS ENGINEERING NTA - 4 Mechatronics Engineering 2016 Page 1 INTRODUCTION TO MECHATRONICS Mechatronics is the field of study
More informationLecture Secure, Trusted and Trustworthy Computing Trusted Execution Environments Intel SGX
1 Lecture Secure, and Trustworthy Computing Execution Environments Intel Prof. Dr.-Ing. Ahmad-Reza Sadeghi System Security Lab Technische Universität Darmstadt (CASED) Germany Winter Term 2015/2016 Intel
More informationMAX PLATFORM FOR AUTONOMOUS BEHAVIORS
MAX PLATFORM FOR AUTONOMOUS BEHAVIORS DAVE HOFERT : PRI Copyright 2018 Perrone Robotics, Inc. All rights reserved. MAX is patented in the U.S. (9,195,233). MAX is patent pending internationally. AVTS is
More informationDecoupling Loads for Nano-Instruction Set Computers
Decoupling Loads for Nano-Instruction Set Computers Ziqiang (Patrick) Huang, Andrew Hilton, Benjamin Lee Duke University {ziqiang.huang, andrew.hilton, benjamin.c.lee}@duke.edu ISCA-43, June 21, 2016 1
More information128Mb Synchronous DRAM. Features High Performance: Description. REV 1.0 May, 2001 NT5SV32M4CT NT5SV16M8CT NT5SV8M16CT
Features High Performance: f Clock Frequency -7K 3 CL=2-75B, CL=3-8B, CL=2 Single Pulsed RAS Interface Fully Synchronous to Positive Clock Edge Four Banks controlled by BS0/BS1 (Bank Select) Units 133
More informationEnhancing Energy Efficiency of Database Applications Using SSDs
Seminar Energy-Efficient Databases 29.06.2011 Enhancing Energy Efficiency of Database Applications Using SSDs Felix Martin Schuhknecht Motivation vs. Energy-Efficiency Seminar 29.06.2011 Felix Martin Schuhknecht
More informationAmpl2m. Kamil Herman Author of Ampl2m conversion tool. Who are you looking at
Who are you looking at Kamil Herman Author of conversion tool Senior automation engineer Working in Automation with ABB control systems since 1995 6 years in ABB Slovakia 2 year working for ABB Mannheim,
More information- - DQ0 NC DQ1 DQ0 DQ2 - NC DQ1 DQ3 NC - NC
SYNCHRONOUS DRAM 64Mb: x4, x8, x16 MT48LC16M4A2 4 Meg x 4 x 4 banks MT48LC8M8A2 2 Meg x 8 x 4 banks MT48LC4M16A2 1 Meg x 16 x 4 banks For the latest data sheet, please refer to the Micron Web site: www.micron.com/mti/msp/html/datasheet.html
More informationME 455 Lecture Ideas, Fall 2010
ME 455 Lecture Ideas, Fall 2010 COURSE INTRODUCTION Course goal, design a vehicle (SAE Baja and Formula) Half lecture half project work Group and individual work, integrated Design - optimal solution subject
More information3200NT Timer Service Manual
Service Manual Valve Serial Number Valve Position 1-LEAd 2-LAg 3-LAg 4-LAg IMPORTANT: Fill in pertinent information on page 3 for future reference. Table of Contents Job Specifications Sheet.....................................................................
More informationSouthern California Edison Rule 21 Storage Charging Interconnection Load Process Guide. Version 1.1
Southern California Edison Rule 21 Storage Charging Interconnection Load Process Guide Version 1.1 October 21, 2016 1 Table of Contents: A. Application Processing Pages 3-4 B. Operational Modes Associated
More informationCMPEN 411 VLSI Digital Circuits Spring Lecture 22: Memery, ROM
CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 22: Memery, ROM [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp12 CMPEN 411 L22 S.1
More informationA new perspective. The Kalmar RTG range.
A new perspective. The range. All new s enable you to take advantage of automating some or all of your processes; from remote control to fully automated moves. Securing the future of your business. s extensive
More informationFeatures and Benefits
Symmetra PX 250/500 Scalable from 25 kw to 500 kw, Parallel-capable up to 2,000 kw Modular, Scalable, Ultra-high Efficiency Power Protection for Data Centers High-performance, right-sized, modular, hot-scalable,
More informationThe ADS-IDAC Dynamic PSA Platform with Dynamically Linked System Fault Trees
The ADS-IDAC Dynamic PSA Platform with Dynamically Linked System Fault Trees Mihai Diaconeasa Center for Reliability and Resilience Engineering The B. John Garrick Institute for the Risk Sciences University
More informationRAM-Type Interface for Embedded User Flash Memory
June 2012 Introduction Reference Design RD1126 MachXO2-640/U and higher density devices provide a User Flash Memory (UFM) block, which can be used for a variety of applications including PROM data storage,
More informationIncremental Joint Extraction of Entity Mentions and Relations
Incremental Joint Extraction of Entity Mentions and Relations Qi Li and Heng Ji {liq7,jih}@rpi.edu Rensselaer Polytechnic Institute End to End Relation Extraction Baltimore is the largest city in the U.S.
More informationDQ0 NC DQ1 DQ0 DQ2 DQ3 DQ Speed Grade
Features SDRAM MT48LC32M4A2 8 Meg x 4 x 4 banks MT48LC16M8A2 4 Meg x 8 x 4 banks MT48LC8M16A2 2 Meg x 16 x 4 banks For the latest data sheet, refer to Micron s Web site: www.micron.com Features PC100 and
More informationSAFETY RELATED MANDATORY OUTRIGGER CYLINDER INSPECTION AND REPAIR
An ISO 9001 Registered Company June 10, 1999 SERVICE BULLETIN SL-158 This Service Bulletin should be read and understood by all Service Technicians who are trained to service the units affected by this
More informationMulti Core Processing in VisionLab
Multi Core Processing in Multi Core CPU Processing in 25 August 2014 Copyright 2001 2014 by Van de Loosdrecht Machine Vision BV All rights reserved jaap@vdlmv.nl Overview Introduction Demonstration Automatic
More informationProcedure for assessing the performance of Autonomous Emergency Braking (AEB) systems in front-to-rear collisions
Procedure for assessing the performance of Autonomous Emergency Braking (AEB) systems in front-to-rear collisions Version 1.3 October 2014 CONTENTS 1 AIM... 3 2 SCOPE... 3 3 BACKGROUND AND RATIONALE...
More informationModule-Integrated Power Electronics for Solar Photovoltaics. Robert Pilawa-Podgurski Power Affiliates Program 33rd Annual Review Friday, May 4th 2012
Module-Integrated Power Electronics for Solar Photovoltaics Robert Pilawa-Podgurski Power Affiliates Program 33rd Annual Review Friday, May 4th 2012 Solar Photovoltaic System Challenges Solar Photovoltaic
More information