DAT105: Computer Architecture Study Period 2, 2009 Exercise 2 Chapter 2: Instruction-Level Parallelism and Its Exploitation
|
|
- Gervase Lester
- 5 years ago
- Views:
Transcription
1 Study Period 2, 29 Exercise 2 Chapter 2: Instruction-Level Parallelism and Its Exploitation Mafijul Islam Department of Computer Science and Engineering November 12, 29
2 Study Period 2, 29 Goals: To understand the notion of instruction-level parallelism (ILP) the notion of dependences and hazards as well as their impact on exploiting ILP the techniques of exploiting ILP Case Studies/Assignments: Case Study 1: 2.1, 2.2, 2.3, 2.5 Assignment 2 of the Exam on
3 Exam : Assignment 2(A) The following MIPS program operates on an array with 64-bit elements. The register R1 points to the beginning of the array from the beginning. The register R2 points to the end. The array always contains 1 elements. ANDI R3, R3, # R3 = LOOP: LD R4, (R1) DMUL R5, R4, R4 DADD R5, R3, R5 SD R5, (R1) # new[i] = old[i-1] + old[i]*old[i] DADDI R3, R4, DADDI R1, R1, 8 BNE R1, R2, LOOP FIND AT LEAST ONE EXAMPLE OF EACH TYPE OF DEPENDENCES
4 Exam : Assignment 2(A) FINDING VARIOUS CATEGORIES OF DEPENDENCES: ANDI R3, R3, LOOP: LD R4, (R1) DMUL R5, R4, R4 DADD R5, R3, R5 SD R5, (R1) DADDI R3, R4, DADDI R1, R1, 8 BNE R1, R2, LOOP Data dependency: DMUL reads R4 which is written by the preceding LD
5 Exam : Assignment 2(A) FINDING VARIOUS CATEGORIES OF DEPENDENCES: ANDI R3, R3, LOOP: LD R4, (R1) DMUL R5, R4, R4 DADD R5, R3, R5 SD R5, (R1) DADDI R3, R4, DADDI R1, R1, 8 BNE R1, R2, LOOP Data dependency: DMUL reads R4 which is written by the preceding LD Name dependency: Both DMUL and DADD write to R5. But the 2nd R5 can be renamed.
6 Exam : Assignment 2(A) FINDING VARIOUS CATEGORIES OF DEPENDENCES: ANDI R3, R3, LOOP: LD R4, (R1) DMUL R5, R4, R4 DADD R5, R3, R5 SD R5, (R1) DADDI R3, R4, DADDI R1, R1, 8 BNE R1, R2, LOOP Data dependency: DMUL reads R4 which is written by the preceding LD Name dependency: Both DMUL and DADD write to R5. But the 2nd R5 can be renamed. Control dependency: The LOOP body depends on BNE for all iterations except the first one.
7 Exam : Assignment 2(A) FINDING VARIOUS CATEGORIES OF DEPENDENCES: ANDI R3, R3, LOOP: LD R4, (R1) DMUL R5, R4, R4 DADD R5, R3, R5 SD R5, (R1) DADDI R3, R4, DADDI R1, R1, 8 BNE R1, R2, LOOP Data dependency: DMUL reads R4 which is written by the preceding LD Name dependency: Both DMUL and DADD write to R5. But the 2nd R5 can be renamed. Control dependency: The LOOP body depends on BNE for all iterations except the first one. HOW TO RESOLVE DEPENDENCES?
8 Exam : Assignment 2(B) FINDING VARIOUS CATEGORIES OF HAZARDS: Dependences are properties of programs Hazards are properties of the pipeline organization ANDI R3, R3, LOOP: LD R4, (R1) DMUL R5, R4, R4 DADD R5, R3, R5 SD R5, (R1) DADDI R3, R4, DADDI R1, R1, 8 BNE R1, R2, LOOP RAW: DMUL and LD on register R4 WAW: DMUL and DADD on register R5 WAR: SD and DAADI on register R1
9 Case Study 1: 2.1 No new instruction execution could be initiated until the previous instruction had completed. Ignore front-end fetch and decode. Execution does not stall for lack of the next instruction, but only 1 instruction/cycle can be issued. The branch is taken and there is a 1 cycle branch delay slot. Code sequence: Loop: LD F2, (Rx) I: MULTD F2, F, F2 I1: DIVD F8, F2, F I2: LD F4, (Ry) I3: ADDD F4, F, F4 Latencies beyond single cycle: Memory LD +3 Memory SD +1 Integer ADD, SUB + Branches +1 ADDD +2 MULTD +4 DIVD +1 Find the baseline performance (in cycle, per loop iteration)
10 Case Study 1: 2.1 Find the baseline performance (in cycle, per loop iteration): Loop: LD F2, (Rx) I: MULTD F2, F, F I1: DIVD F8, F2, F I2: LD F4, (Ry) I3: ADDD F4, F, F Cycles per loop iteration: 37 Latencies beyond single cycle: Memory LD +3 Memory SD +1 Integer ADD, SUB + Branches +1 ADDD +2 MULTD +4 DIVD +1 Can we improve the performance?
11 Case Study 1: 2.2 Pipeline stalled only on true data dependences: Loop: LD F2, (Rx) I: MULTD F2, F, F2 I1: DIVD F8, F2, F I2: LD F4, (Ry) I3: ADDD F4, F, F4 Latencies beyond single cycle: Memory LD +3 Memory SD +1 Integer ADD, SUB + Branches +1 ADDD +2 MULTD +4 DIVD +1
12 Case Study 1: 2.2 Pipeline stalled only on true data dependences: Loop: LD F2, (Rx) I: MULTD F2, F, F2 I1: DIVD F8, F2, F I2: LD F4, (Ry) I3: ADDD F4, F, F4 Number of stalls removed:
13 Case Study 1: 2.2 Pipeline stalled only on true data dependences: Loop: LD F2, (Rx) I: MULTD F2, F, F I1: DIVD F8, F2, F I2: LD F4, (Ry) I3: ADDD F4, F, F4 Number of stalls removed:
14 Case Study 1: 2.2 Pipeline stalled only on true data dependences: Loop: LD F2, (Rx) I: MULTD F2, F, F I1: DIVD F8, F2, F I2: LD F4, (Ry) I3: ADDD F4, F, F4 Number of stalls removed:
15 Case Study 1: 2.2 Pipeline stalled only on true data dependences: Loop: LD F2, (Rx) I: MULTD F2, F, F I1: DIVD F8, F2, F I2: LD F4, (Ry) I3: ADDD F4, F, F Number of stalls removed:
16 Case Study 1: 2.2 Pipeline stalled only on true data dependences: Loop: LD F2, (Rx) I: MULTD F2, F, F I1: DIVD F8, F2, F I2: LD F4, (Ry) I3: ADDD F4, F, F Number of stalls removed: 1 I2 (LD) issued 3 stalls of I2 (LD) 1 I3 (ADDD) issued
17 Case Study 1: 2.2 Pipeline stalled only on true data dependences: Number of stalls removed: Loop: LD F2, (Rx) I: MULTD F2, F, F I1: DIVD F8, F2, F I2: LD F4, (Ry) I3: ADDD F4, F, F I2 (LD) issued 3 stalls of I2 (LD) 1 I3 (ADDD) issued 2 stalls of I3 overlaps with I1
18 Case Study 1: 2.2 Pipeline stalled only on true data dependences: Loop: LD F2, (Rx) I: MULTD F2, F, F I1: DIVD F8, F2, F I2: LD F4, (Ry) I3: ADDD F4, F, F Number of stalls removed: 1 I2 (LD) issued 3 stalls of I2 (LD) 1 I3 (ADDD) issued 2 stalls of I3 overlaps with I1 1 I5 (SD) issued 1 stall of I4 overlaps with I5 1 I6 (ADDI) issued
19 Case Study 1: 2.2 Pipeline stalled only on true data dependences: Number of stalls removed: Loop: LD F2, (Rx) I: MULTD F2, F, F I1: DIVD F8, F2, F I2 (LD) issued I2: LD F4, (Ry) stalls of I2 (LD) I3: ADDD F4, F, F I3 (ADDD) issued Cycles per loop iteration: 27 2 stalls of I3 overlaps with I1 1 I5 (SD) issued 1 stall of I4 overlaps with I5 1 I6 (ADDI) issued TOTAL: 1 stalls removed Can we improve the performance?
20 Multiple-issue design: Loop: LD F2, (Rx) I: MULTD F2, F, F2 I1: DIVD F8, F2, F I2: LD F4, (Ry) I3: ADDD F4, F, F4 DAT15: Computer Architecture Case Study 1: 2.3
21 Multiple-issue design: DAT15: Computer Architecture Case Study 1: 2.3 Execution Pipeline 1 Execution Pipeline 2 Loop: LD F2, (Rx) nop <3 stalls: LD> nops I: MULTD F2, F, F2 I1: DIVD F8, F2, F I2: LD F4, (Ry) I3: ADDD F4, F, F4
22 Multiple-issue design: DAT15: Computer Architecture Case Study 1: 2.3 Execution Pipeline 1 Execution Pipeline 2 Loop: LD F2, (Rx) nop <3 stalls: LD> nops I: MULTD F2, F, F2 nop <4 stalls MULTD(I)> nops I1: DIVD F8, F2, F I2: LD F4, (Ry) I3: ADDD F4, F, F4
23 Multiple-issue design: DAT15: Computer Architecture Case Study 1: 2.3 Execution Pipeline 1 Execution Pipeline 2 Loop: LD F2, (Rx) nop <3 stalls: LD> nops I: MULTD F2, F, F2 nop <4 stalls MULTD(I)> nops I1: DIVD F8, F2, F I2: LD F4, (Ry) <3 stalls: LD(I2)> nops I3: ADDD F4, F, F4
24 Multiple-issue design: DAT15: Computer Architecture Case Study 1: 2.3 Execution Pipeline 1 Execution Pipeline 2 Loop: LD F2, (Rx) nop <3 stalls: LD> nops I: MULTD F2, F, F2 nop <4 stalls MULTD(I)> nops I1: DIVD F8, F2, F I2: LD F4, (Ry) <3 stalls: LD(I2)> nops I3: ADDD F4, F, F4 nop <6 stalls: DIVD(I1) nops nop < 1 stall: BNZ(I9)> Cycles per loop iteration: 24 Can we improve the performance?
25 Case Study 1: 2.5 out-of-order issue and execution: Loop: LD F2, (Rx) I: MULTD F2, F, F2 I1: DIVD F8, F2, F I2: LD F4, (Ry) I3: ADDD F4, F, F4
26 Case Study 1: 2.5 out-of-order issue and execution: Execution Pipeline 1 Execution Pipeline 2 Loop: LD F2, (Rx) I2: LD F4, (Ry) <3 stalls: LD> <3 stalls: LD(I2)> I: MULTD F2, F, F2 I1: DIVD F8, F2, F I3: ADDD F4, F, F4
27 Case Study 1: 2.5 out-of-order issue and execution: Execution Pipeline 1 Execution Pipeline 2 Loop: LD F2, (Rx) I2: LD F4, (Ry) <3 stalls: LD> <3 stalls: LD(I2)> I: MULTD F2, F, F2 I3: ADDD F4, F, F4 <4 stalls: MULTD(I)> <2 stalls: ADDD(I3)> I1: DIVD F8, F2, F
28 Case Study 1: 2.5 out-of-order issue and execution: Execution Pipeline 1 Execution Pipeline 2 Loop: LD F2, (Rx) I2: LD F4, (Ry) <3 stalls: LD> <3 stalls: LD(I2)> I: MULTD F2, F, F2 I3: ADDD F4, F, F4 <4 stalls: MULTD(I)> <2 stalls: ADDD(I3)> I1: DIVD F8, F2, F
29 Case Study 1: 2.5 out-of-order issue and execution: Execution Pipeline 1 Execution Pipeline 2 Cycle # Loop: LD F2, (Rx) I2: LD F4, (Ry) 1 <3 stalls: LD> <3 stalls: LD(I2)> I: MULTD F2, F, F2 I3: ADDD F4, F, F4 4 <4 stalls: MULTD(I)> <2 stalls: ADDD(I3)> 6 I1: DIVD F8, F2, F <1 stall: SD(I5) 8 9 1
30 Case Study 1: 2.5 out-of-order issue and execution: Execution Pipeline 1 Execution Pipeline 2 Cycle # Loop: LD F2, (Rx) I2: LD F4, (Ry) 1 <3 stalls: LD> <3 stalls: LD(I2)> I: MULTD F2, F, F2 I3: ADDD F4, F, F4 5 <4 stalls: MULTD(I)> <2 stalls: ADDD(I3)> I1: DIVD F8, F2, F <1 stall: SD(I5) 1 <8 stalls: DIVD(I1)> 21 <1 stall: BNZ(I9)> Cycles per loop iteration: 22 Can we improve the performance?
31 Case Study 1: Summary of Task Processor Model/Technique Performance (in cycles) 2.1 single-issue, respect all dependences, no execution until the previous instruction execution is completed single-issue, respect only true data dependences multiple-issue, in-order issue multiple-issue, out-of-order issue 22
COSC 6385 Computer Architecture. - Tomasulos Algorithm
COSC 6385 Computer Architecture - Tomasulos Algorithm Fall 2008 Analyzing a short code-sequence DIV.D F0, F2, F4 ADD.D F6, F0, F8 S.D F6, 0(R1) SUB.D F8, F10, F14 MUL.D F6, F10, F8 1 Analyzing a short
More informationLecture 14: Instruction Level Parallelism
Lecture 14: Instruction Level Parallelism Last time Pipelining in the real world Today Control hazards Other pipelines Take QUIZ 10 over P&H 4.10-15, before 11:59pm today Homework 5 due Thursday March
More information6.823 Computer System Architecture Prerequisite Self-Assessment Test Assigned Feb. 6, 2019 Due Feb 11, 2019
6.823 Computer System Architecture Prerequisite Self-Assessment Test Assigned Feb. 6, 2019 Due Feb 11, 2019 http://csg.csail.mit.edu/6.823/ This self-assessment test is intended to help you determine your
More informationParallelism I: Inside the Core
Parallelism I: Inside the Core 1 The final Comprehensive Same general format as the Midterm. Review the homeworks, the slides, and the quizzes. 2 Key Points What is wide issue mean? How does does it affect
More informationCIS 662: Sample midterm w solutions
CIS 662: Sample midterm w solutions 1. (40 points) A processor has the following stages in its pipeline: IF ID ALU1 MEM1 MEM2 ALU2 WB. ALU1 stage is used for effective address calculation for loads, stores
More informationComputer Architecture 计算机体系结构. Lecture 3. Instruction-Level Parallelism I 第三讲 指令级并行 I. Chao Li, PhD. 李超博士
Computer Architecture 计算机体系结构 Lecture 3. Instruction-Level Parallelism I 第三讲 指令级并行 I Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2018 Review ISA, micro-architecture, physical design Evolution of ISA CISC vs
More informationCSCI 510: Computer Architecture Written Assignment 2 Solutions
CSCI 510: Computer Architecture Written Assignment 2 Solutions The following code does compution over two vectors. Consider different execution scenarios and provide the average number of cycles per iterion
More informationOut-of-order Pipeline. Register Read. OOO execution (2-wide) OOO execution (2-wide) OOO execution (2-wide) OOO execution (2-wide)
Out-of-order Pipeline Register Read When do instructions read the register file? Fetch Decode Rename Dispatch Buffer of instructions Issue Reg-read Execute Writeback Commit Option #: after select, right
More informationAdvanced Superscalar Architectures. Speculative and Out-of-Order Execution
6.823, L16--1 Advanced Superscalar Architectures Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Speculative and Out-of-Order Execution Branch Prediction kill kill Branch
More informationAnnouncements. Programming assignment #2 due Monday 9/24. Talk: Architectural Acceleration of Real Time Physics Glenn Reinman, UCLA CS
Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar GAS STATION Pipelining II Fall 2007 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin,
More informationUnit 9: Static & Dynamic Scheduling
CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed by Drew Hilton, Amir Roth and Milo Mar;n at University of Pennsylvania CIS 501: Comp. Arch. Prof. Milo Martin
More informationAnne Bracy CS 3410 Computer Science Cornell University. [K. Bala, A. Bracy, S. McKee, E. Sirer, H. Weatherspoon]
Anne Bracy CS 3410 Computer Science Cornell University [K. Bala, A. Bracy, S. McKee, E. Sirer, H. Weatherspoon] Prog. Mem PC +4 inst Reg. File 5 5 5 control ALU Data Mem Fetch Decode Execute Memory WB
More informationCIS 371 Computer Organization and Design
CIS 371 Computer Organization and Design Unit 10: Static & Dynamic Scheduling Slides developed by Milo Martin & Amir Roth at the University of Pennsylvania with sources that included University of Wisconsin
More informationCIS 371 Computer Organization and Design
CIS 371 Computer Organization and Design Unit 10: Static & Dynamic Scheduling Slides developed by M. Martin, A.Roth, C.J. Taylor and Benedict Brown at the University of Pennsylvania with sources that included
More informationTo read more. CS 6354: Tomasulo. Intel Skylake. Scheduling. How can we reorder instructions? Without changing the answer.
To read more CS 6354: Tomasulo 21 September 2016 This day s paper: Tomasulo, An Efficient Algorithm for Exploiting Multiple Arithmetic Units Supplementary readings: Hennessy and Patterson, Computer Architecture:
More informationCS 6354: Tomasulo. 21 September 2016
1 CS 6354: Tomasulo 21 September 2016 To read more 1 This day s paper: Tomasulo, An Efficient Algorithm for Exploiting Multiple Arithmetic Units Supplementary readings: Hennessy and Patterson, Computer
More informationCode Scheduling & Limitations
This Unit: Static & Dynamic Scheduling CIS 371 Computer Organization and Design Unit 11: Static and Dynamic Scheduling App App App System software Mem CPU I/O Code scheduling To reduce pipeline stalls
More informationPipelined MIPS Datapath with Control Signals
uction ess uction Rs [:26] (Opcode[5:]) [5:] ranch luor. Decoder Pipelined MIPS path with Signals luor Raddr at Five instruction sequence to be processed by pipeline: op [:26] rs [25:2] rt [2:6] rd [5:]
More informationHakim Weatherspoon CS 3410 Computer Science Cornell University
Hakim Weatherspoon CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer. memory inst register
More informationComputer Architecture: Out-of-Order Execution. Prof. Onur Mutlu (editted by Seth) Carnegie Mellon University
Computer Architecture: Out-of-Order Execution Prof. Onur Mutlu (editted by Seth) Carnegie Mellon University Reading for Today Smith and Sohi, The Microarchitecture of Superscalar Processors, Proceedings
More informationPipeline Hazards. See P&H Chapter 4.7. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University
Pipeline Hazards See P&H Chapter 4.7 Hakim Weatherspoon CS 341, Spring 213 Computer Science Cornell niversity Goals for Today Data Hazards Revisit Pipelined Processors Data dependencies Problem, detection,
More informationLecture 20: Parallelism ILP to Multicores. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 20: Parallelism ILP to Multicores James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L20 S1, James C. Hoe, CMU/ECE/CALCM, 2018 18 447 S18 L20 S2, James C. Hoe, CMU/ECE/CALCM,
More informationPipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold
Pipelining Readings: 4.5-4.8 Example: Doing the laundry Ann, Brian, Cathy, & Dave A B C D each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes Folder takes
More informationENGN1640: Design of Computing Systems Topic 05: Pipeline Processor Design
ENGN64: Design of Computing Systems Topic 5: Pipeline Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University
More informationImproving Performance: Pipelining!
Iproving Perforance: Pipelining! Meory General registers Meory ID EXE MEM WB Instruction Fetch (includes PC increent) ID Instruction Decode + fetching values fro general purpose registers EXE EXEcute arithetic/logic
More informationCS152: Computer Architecture and Engineering Introduction to Pipelining. October 22, 1997 Dave Patterson (http.cs.berkeley.
CS152: Computer Architecture and Engineering Introduction to Pipelining October 22, 1997 Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/ cs 152
More informationPipeline Hazards. See P&H Chapter 4.7. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University
Pipeline Hazards See P&H Chapter 4.7 Hakim Weatherspoon CS 341, Spring 213 Computer Science Cornell niversity Goals for Today Data Hazards Revisit Pipelined Processors Data dependencies Problem, detection,
More informationChapter 2 ( ) -Revisit ReOrder Buffer -Exception handling and. (parallelism in HW)
Comuter Architecture A Quantitative Aroach, Fifth Edition Chater 2 (2.6-2.11) -Revisit ReOrder Buffer -Excetion handling and (seculation in hardware) -VLIW and EPIC (seculation in SW, arallelism in SW)
More informationAdvanced Superscalar Architectures
Advanced Suerscalar Architectures Krste Asanovic Laboratory for Comuter Science Massachusetts Institute of Technology Physical Register Renaming (single hysical register file: MIPS R10K, Alha 21264, Pentium-4)
More informationPIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS
PIPELINING: BRANCH AND MULTICYCLE INSTRUCTIONS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 1 submission
More informationEECS 583 Class 9 Classic Optimization
EECS 583 Class 9 Classic Optimization University of Michigan September 28, 2016 Generalizing Dataflow Analysis Transfer function» How information is changed by something (BB)» OUT = GEN + (IN KILL) /*
More informationChapter 3: Computer Organization Fundamentals. Oregon State University School of Electrical Engineering and Computer Science.
Chapter 3: Computer Organization Fundamentals Prof. Ben Lee Oregon State University School of Electrical Engineering and Computer Science Chapter Goals Understand the organization of a computer system
More informationCS 152 Computer Architecture and Engineering. Lecture 14 - Advanced Superscalars
CS 152 Comuter Architecture and Engineering Lecture 14 - Advanced Suerscalars Krste Asanovic Electrical Engineering and Comuter Sciences University of California at Berkeley htt://www.eecs.berkeley.edu/~krste
More informationCMU Introduction to Computer Architecture, Spring 2013 HW 3 Solutions: Microprogramming Wrap-up and Pipelining
CMU 18-447 Introduction to Computer Architecture, Spring 2013 HW 3 Solutions: Microprogramming Wrap-up and Pipelining Instructor: Prof. Onur Mutlu TAs: Justin Meza, Yoongu Kim, Jason Lin 1 Adding the REP
More information128Mb Synchronous DRAM. Features High Performance: Description. REV 1.0 May, 2001 NT5SV32M4CT NT5SV16M8CT NT5SV8M16CT
Features High Performance: f Clock Frequency -7K 3 CL=2-75B, CL=3-8B, CL=2 Single Pulsed RAS Interface Fully Synchronous to Positive Clock Edge Four Banks controlled by BS0/BS1 (Bank Select) Units 133
More informationComputer Architecture ELE 475 / COS 475 Slide Deck 6: Superscalar 3. David Wentzlaff Department of Electrical Engineering Princeton University
Computer Architecture ELE 475 / COS 475 Slide Deck 6: Superscalar 3 David Wentzlaff Department of Electrical Engineering Princeton University 1 Agenda SpeculaJon and Branches Register Renaming Memory DisambiguaJon
More informationTomasulo-Style Register Renaming
Tomasulo-Style Register Renaming ldf f0,x(r1) allocate RS#4 map f0 to RS#4 mulf f4,f0, allocate RS#6 ready, copy value f0 not ready, copy tag Map Table f0 f4 RS#4 RS T V1 V2 T1 T2 4 REG[r1] 6 REG[] RS#4
More informationCS 152 Computer Architecture and Engineering. Lecture 15 - Advanced Superscalars
CS 152 Comuter Architecture and Engineering Lecture 15 - Advanced Suerscalars Krste Asanovic Electrical Engineering and Comuter Sciences University of California at Berkeley htt://www.eecs.berkeley.edu/~krste
More informationCprE 281: Digital Logic
CprE 28: Digital Logic Instructor: Alexander Stoytchev http://www.ece.iastate.edu/~alexs/classes/ Registers and Counters CprE 28: Digital Logic Iowa State University, Ames, IA Copyright Alexander Stoytchev
More informationECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 10 Instruction-Level Parallelism Part 3
ECE 552 / CPS 550 Advanced Comuter Architecture I Lecture 10 Instruction-Level Parallelism Part 3 Benjamin Lee Electrical and Comuter Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html
More informationOptimality of Tomasulo s Algorithm Luna, Dong Gang, Zhao
Optimality of Tomasulo s Algorithm Luna, Dong Gang, Zhao Feb 28th, 2002 Our Questions about Tomasulo Questions about Tomasulo s Algorithm Is it optimal (can always produce the wisest instruction execution
More informationIS42S32200C1. 512K Bits x 32 Bits x 4 Banks (64-MBIT) SYNCHRONOUS DYNAMIC RAM
512K Bits x 32 Bits x 4 Banks (64-MBIT) SYNCHRONOUS DYNAMIC RAM JANUARY 2007 FEATURES Clock frequency: 183, 166, 143 MHz Fully synchronous; all signals referenced to a positive clock edge Internal bank
More informationIS42S32200L IS45S32200L
IS42S32200L IS45S32200L 512K Bits x 32 Bits x 4 Banks (64-MBIT) SYNCHRONOUS DYNAMIC RAM OCTOBER 2012 FEATURES Clock frequency: 200, 166, 143, 133 MHz Fully synchronous; all signals referenced to a positive
More informationSDRAM AS4SD8M Mb: 8 Meg x 16 SDRAM Synchronous DRAM Memory. PIN ASSIGNMENT (Top View)
128 Mb: 8 Meg x 16 SDRAM Synchronous DRAM Memory FEATURES Full Military temp (-55 C to 125 C) processing available Configuration: 8 Meg x 16 (2 Meg x 16 x 4 banks) Fully synchronous; all signals registered
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 23 Synchronization 2006-11-16 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ 1 Last Time:
More informationDeveloping PMs for Hydraulic System
Developing PMs for Hydraulic System Focus on failure prevention rather than troubleshooting. Here are some best practices you can use to upgrade your preventive maintenance procedures for hydraulic systems.
More informationM2 Instruction Set Architecture
M2 Instruction Set Architecture Module Outline Addressing modes. Instruction classes. MIPS-I ISA. High level languages, Assembly languages and object code. Translating and starting a program. Subroutine
More information- DQ0 - NC DQ1 - NC - NC DQ0 - NC DQ2 DQ1 DQ CONFIGURATION. None SPEED GRADE
SYNCHRONOUS DRAM 52Mb: x4, x8, x6 MT48LC28M4A2 32 MEG x 4 x 4 S MT48LC64M8A2 6 MEG x 8 x 4 S MT48LC32M6A2 8 MEG x 6 x 4 S For the latest data sheet, please refer to the Micron Web site: www.micron.com/dramds
More informationProgramming Languages (CS 550)
Programming Languages (CS 550) Mini Language Compiler Jeremy R. Johnson 1 Introduction Objective: To illustrate how to map Mini Language instructions to RAL instructions. To do this in a systematic way
More informationHYB25D256400/800AT 256-MBit Double Data Rata SDRAM
256-MBit Double Data Rata SDRAM Features CAS Latency and Frequency Maximum Operating Frequency (MHz) CAS Latency DDR266A -7 DDR200-8 2 133 100 2.5 143 125 Double data rate architecture: two data transfers
More informationSYNCHRONOUS DRAM. 128Mb: x32 SDRAM. MT48LC4M32B2-1 Meg x 32 x 4 banks
SYNCHRONOUS DRAM 128Mb: x32 MT48LC4M32B2-1 Meg x 32 x 4 banks For the latest data sheet, please refer to the Micron Web site: www.micron.com/sdramds FEATURES PC100 functionality Fully synchronous; all
More informationCMPEN 411 VLSI Digital Circuits Spring Lecture 20: Multiplier Design
CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 20: Multiplier Design [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN 411
More informationA48P4616B. 16M X 16 Bit DDR DRAM. Document Title 16M X 16 Bit DDR DRAM. Revision History. AMIC Technology, Corp. Rev. No. History Issue Date Remark
16M X 16 Bit DDR DRAM Document Title 16M X 16 Bit DDR DRAM Revision History Rev. No. History Issue Date Remark 1.0 Initial issue January 9, 2014 Final (January, 2014, Version 1.0) AMIC Technology, Corp.
More informationCMPEN 411 VLSI Digital Circuits Spring Lecture 24: Peripheral Memory Circuits
CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 24: Peripheral Memory Circuits [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp12
More informationSDRAM Device Operations
DEVICE OPERATIONS SDRAM Device Operations * Samsung Electronics reserves the right to change products or specification without notice. EECTRONICS DEVICE OPERATIONS A. MODE REGISTER FIED TABE TO PROGRAM
More informationAVS64( )L
AVS640416.1604.0808L 64 Mb Synchronous DRAM 16 Mb x 4 0416 8 Mb x 8 0808 4 Mb x 161604 Features PC100/PC133/PC143/PC166compliant Fully synchronous; all signals registered on positive edge of system clock
More informationBASIC MECHATRONICS ENGINEERING
MBEYA UNIVERSITY OF SCIENCE AND TECHNOLOGY Lecture Summary on BASIC MECHATRONICS ENGINEERING NTA - 4 Mechatronics Engineering 2016 Page 1 INTRODUCTION TO MECHATRONICS Mechatronics is the field of study
More informationHYB25D256[400/800/160]B[T/C](L) 256-Mbit Double Data Rate SDRAM, Die Rev. B Data Sheet Jan. 2003, V1.1. Features. Description
Data Sheet Jan. 2003, V1.1 Features CAS Latency and Frequency Maximum Operating Frequency (MHz) CAS Latency DDR200-8 DDR266A -7 DDR266-7F DDR333-6 2 100 133 133 133 2.5 125 143 143 166 Double data rate
More information128Mb DDR SDRAM. Features. Description. REV 1.1 Oct, 2006
Features Double data rate architecture: two data transfers per clock cycle Bidirectional data strobe () is transmitted and received with data, to be used in capturing data at the receiver is edge-aligned
More informationFeature. 512Mb DDR SDRAM. REV 1.1 Jul CAS Latency Frequency NT5DS64M8DS NT5DS32M16DS CONSUMER DRAM. 2KB page size for all configurations.
Feature CAS Latency Frequency DDR-333 DDR400 DDR500 Speed Sorts Units -6K/-6KI -5T/-5TI -4T CL-tRCD-tRP 2.5-3-3 3-3-3 3-4-4 tck CL=2 266 266-2KB page size for all configurations. DQS is edge-aligned with
More information- DQ0 - NC DQ1 - NC - NC DQ0 - NC DQ2 DQ1 DQ
SYHRONOUS DRAM Features PC66, PC100, and PC133compliant Fully synchronous; all signals registered on positive edge of system clock Internal pipelined operation; column address can be changed every clock
More informationHYB25D256400B[T/C](L) HYB25D256800B[T/C](L) HYB25D256160B[T/C](L)
Data Sheet, Rev. 1.21, Jul. 2004 HYB25D256400B[T/C](L) HYB25D256800B[T/C](L) HYB25D256160B[T/C](L) 256 Mbit Double Data Rate SDRAM DDR SDRAM Memory Products N e v e r s t o p t h i n k i n g. Edition 2004-07
More informationPMD709408C/PMD709416C. Document Title. Revision History. 512Mb (64M x 8 / 32M x 16) DDR SDRAM C die Datasheet
Document Title 512Mb (64M x 8 / 32M x 16) DDR SDRAM C die Datasheet Revision History Revision Date Page Notes 0.1 October, 2013 Preliminary 1.0 March, 2014 Official release 1.1 April, 2014 500Mbps speed
More information128Mb Synchronous DRAM Specification
128Mb Synchronous DRAM Specification A3V28S40JTP Zentel Electronics Corp. I Revision 1.0 General Description A3V28S40JTP is organized as 4-bank x 2,097,154-word x 16-bit Synchronous DRAM with LVTTL interface.
More information128Mb Synchronous DRAM Specification
128Mb Synchronous DRAM Specification A3V28S40JTP/JBF Zentel Electronics Corp. Revision 1.1 28M Single Data Rate Synchronous DRAM General Description A3V28S40JTP/JBF is organized as 4-bank x 2,097,154-word
More informationSDRAM DEVICE OPERATION
POWER UP SEQUENCE SDRAM must be initialized with the proper power-up sequence to the following (JEDEC Standard 21C 3.11.5.4): 1. Apply power and start clock. Attempt to maintain a NOP condition at the
More information- - DQ0 NC DQ1 DQ0 DQ2 - NC DQ1 DQ3 NC - NC
SYNCHRONOUS DRAM 64Mb: x4, x8, x16 MT48LC16M4A2 4 Meg x 4 x 4 banks MT48LC8M8A2 2 Meg x 8 x 4 banks MT48LC4M16A2 1 Meg x 16 x 4 banks For the latest data sheet, please refer to the Micron Web site: www.micron.com/mti/msp/html/datasheet.html
More informationRegisters Shift Registers Accumulators Register Files Register Transfer Language. Chapter 8 Registers. SKEE2263 Digital Systems
Chapter 8 Registers SKEE2263 igital Systems Mun im Zabidi {munim@utm.my} Ismahani Ismail {ismahani@fke.utm.my} Izam Kamisian {e-izam@utm.my} Faculty of Electrical Engineering, Universiti Teknologi Malaysia
More informationScheduling. Purpose of scheduling. Scheduling. Scheduling. Concurrent & Distributed Systems Purpose of scheduling.
427 Concurrent & Distributed Systems 2017 6 Uwe R. Zimmer - The Australian National University 429 Motivation and definition of terms Purpose of scheduling 2017 Uwe R. Zimmer, The Australian National University
More informationDQ0 NC DQ1 DQ0 DQ2 DQ3 DQ Speed Grade
Features SDRAM MT48LC32M4A2 8 Meg x 4 x 4 banks MT48LC16M8A2 4 Meg x 8 x 4 banks MT48LC8M16A2 2 Meg x 16 x 4 banks For the latest data sheet, refer to Micron s Web site: www.micron.com Features PC100 and
More informationRAM-Type Interface for Embedded User Flash Memory
June 2012 Introduction Reference Design RD1126 MachXO2-640/U and higher density devices provide a User Flash Memory (UFM) block, which can be used for a variety of applications including PROM data storage,
More information- DQ0 - NC DQ1 - NC - NC DQ0 - NC DQ2 DQ1 DQ
SYNCHRONOUS DRAM ADVANCE MT48LC28M4A2 32 Meg x 4 x 4 banks MT48LC64M8A2 6 Meg x 8 x 4 banks MT48LC32M6A2 8 Meg x 6 x 4 banks For the latest data sheet, please refer to the Micron Web site: www.micron.com/dramds
More informationAPPLICATION NOTE Application Note for Torque Down Capper Application
Application Note for Torque Down Capper Application 1 Application Note for Torque Down Capper using ASDA-A2 servo Contents Application Note for Capper Axis with Reject Queue using ASDA-A2 servo... 2 1
More information- - DQ0 NC DQ1 DQ0 DQ2 - NC DQ1 DQ3 NC - NC
SYHRONOUS DRAM 128Mb: x4, x8, x16 MT48LC32M4A2 8 Meg x 4 x 4 banks MT48LC16M8A2 4 Meg x 8 x 4 banks MT48LC8M16A2 2 Meg x 16 x 4 banks For the latest data sheet, please refer to the Micron Web site: www.micron.com/dramds
More informationTopics on Compilers. Introduction to CGRA
4541.775 Topics on Compilers Introduction to CGRA Spring 2011 Reconfigurable Architectures reconfigurable hardware (reconfigware) implement specific hardware structures dynamically and on demand high performance
More informationPMD706416A. Document Title. 64Mb (4M x 16) DDR SDRAM (A die) Datasheet
Document Title 64Mb (4M x 16) DDR SDRAM (A die) Datasheet This document is a general product description and subject to change without notice. 64MBIT DDR DRAM Features JEDEC DDR Compliant Differential
More informationIS42S Meg x MBIT SYNCHRONOUS DRAM SEPTEMBER 2009
16Meg x16 256-MBIT SYNCHRONOUS DRAM SEPTEMBER 2009 FEATURES Clock frequency: 166, 143, 133 MHz Fully synchronous; all signals referenced to a positive clock edge Internal bank for hiding row access/precharge
More informationDecoupling Loads for Nano-Instruction Set Computers
Decoupling Loads for Nano-Instruction Set Computers Ziqiang (Patrick) Huang, Andrew Hilton, Benjamin Lee Duke University {ziqiang.huang, andrew.hilton, benjamin.c.lee}@duke.edu ISCA-43, June 21, 2016 1
More informationIS42S Meg Bits x 16 Bits x 4 Banks (64-MBIT) SYNCHRONOUS DYNAMIC RAM FEATURES OVERVIEW. PIN CONFIGURATIONS 54-Pin TSOP (Type II)
1 Meg Bits x 16 Bits x 4 Banks (64-MBIT) SYNCHRONOUS DYNAMIC RAM JANUARY 2008 FEATURES Clock frequency: 166, 143 MHz Fully synchronous; all signals referenced to a positive clock edge Internal bank for
More informationStorage and Memory Hierarchy CS165
Storage and Memory Hierarchy CS165 What is the memory hierarchy? L1
More informationFabComp: Hardware specication
Sol Boucher and Evan Klei CSCI-453-01 04/28/14 FabComp: Hardware specication 1 Hardware The computer is composed of a largely isolated data unit and control unit, which are only connected by a couple of
More informationWarped-Compression: Enabling Power Efficient GPUs through Register Compression
WarpedCompression: Enabling Power Efficient GPUs through Register Compression Sangpil Lee, Keunsoo Kim, Won Woo Ro (Yonsei University*) Gunjae Koo, Hyeran Jeon, Murali Annavaram (USC) (*Work done while
More informationModern Industrial Pneumatics. Design and Troubleshooting Industrial Pneumatics PN111 PN121
Modern Industrial Pneumatics Design and Troubleshooting Industrial Pneumatics PN111 PN121 Drives: Cylinders for different drive purposes Valves: Various valve types (pneumatically/electrically controlled,
More informationPT483208FHG PT481616FHG
Table of Content- 8M x 4Banks x 8bits SDRAM 4M x 4Banks x 16bits SDRAM 1. GENERAL DESCRIPTION...3 2. FEATURES...3 3. PART NUMBER INFORMATION...3 4. PIN CONFIGURATION...4 5. PIN DESCRIPTION...5 6. BLOCK
More informationImproving Memory System Performance with Energy-Efficient Value Speculation
Improving Memory System Performance with Energy-Efficient Value Speculation Nana B. Sam and Min Burtscher Computer Systems Laboratory Cornell University Ithaca, NY 14853 {besema, burtscher}@csl.cornell.edu
More informationDATA SHEET. 512M bits SDRAM. EDS5104ABTA (128M words 4 bits) EDS5108ABTA (64M words 8 bits) EDS5116ABTA (32M words 16 bits) EOL Product VDD NC DQ0
DATA SHEET 512M bits SDRAM EDS5104ABTA (128M words 4 bits) EDS5108ABTA (64M words 8 bits) EDS5116ABTA (32M words 16 bits) Description The EDS5104AB is a 512M bits SDRAM organized as 33,554,432 words 4
More informationUsing Tridium s Sedona 1.2 Components with Workbench
Using Tridium s Sedona 1.2 Components with Workbench This tutorial assists in the understanding of the Sedona components provided in Tridium s Sedona-1.2.28 release. New with the 1.2 release is that the
More informationUC Berkeley CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 20 Synchronous Digital Systems Blu-ray vs HD-DVD war over? As you know, there are two different, competing formats for the next
More informationNear-Optimal Precharging in High-Performance Nanoscale CMOS Caches
Near-Optimal Precharging in High-Performance Nanoscale CMOS Caches Se-Hyun Yang and Babak Falsafi Computer Architecture Laboratory (CALCM) Carnegie Mellon University {sehyun, babak}@cmu.edu http://www.ece.cmu.edu/~powertap
More informationt WR = 2 CLK A2 Notes:
SDR SDRAM MT48LC16M4A2 4 Meg x 4 x 4 Banks MT48LC8M8A2 2 Meg x 8 x 4 Banks MT48LC4M16A2 1 Meg x 16 x 4 Banks 64Mb: x4, x8, x16 SDRAM Features Features PC100- and PC133-compliant Fully synchronous; all
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 02
More informationGround Penetrating Radar Survey of a Cemetery: Interpretation
GPR Cemetery Survey Ground Penetrating Radar Survey of a Cemetery: Interpretation Dr. Andy R. Bobyarchick Department of Geography and Earth Sciences UNC Charlotte Overview This investigation is appropriate
More information18 October, 2014 Page 1
19 October, 2014 -- There s an annoying deficiency in the stock fuel quantity indicator. It s driven by a capacitive probe in the lower/left tank, so the indicator reads full until the fuel is completely
More informationApplication of Sequence Alignment to Location Tracking Data
Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor Lucila Ohno-Machado and Professor Staal Vinterbo Application of Sequence Alignment
More informationBevel differential. 1 Description
Bevel differential 1 Description Bevel gear differential in KISSsys 1.1 Task Bevel gear differentials can be modelled in KISSsys, but the procedure is bit difficult and may need some time to be able to
More informationTECHNICAL REPORTS from the ELECTRONICS GROUP at the UNIVERSITY of OTAGO. Table of Multiple Feedback Shift Registers
ISSN 1172-496X ISSN 1172-4234 (Print) (Online) TECHNICAL REPORTS from the ELECTRONICS GROUP at the UNIVERSITY of OTAGO Table of Multiple Feedback Shift Registers by R. W. Ward, T.C.A. Molteno ELECTRONICS
More informationCHAPTER 4 APPLIED OPERATION CHAPTER44 APPLIED OPERATION
CHAPTER 4 APPLIED OPERATION CHAPTER44 APPLIED OPERATION 4 1 4.1 Shifting Input Values Shifting input 1-point shift Temperature Upper-limit value Lower-limit value 0 After shift Before shift Input shift
More informationFault Attacks Made Easy: Differential Fault Analysis Automation on Assembly Code
Fault Attacks Made Easy: Differential Fault Analysis Automation on Assembly Code Jakub Breier, Xiaolu Hou and Yang Liu 10 September 2018 1 / 25 Table of Contents 1 Background and Motivation 2 Overview
More informationDOUBLE DATA RATE (DDR) SDRAM
UBLE DATA RATE Features VDD = +2.5V ±.2V, VD = +2.5V ±.2V Bidirectional data strobe transmitted/ received with data, i.e., source-synchronous data capture x6 has two one per byte Internal, pipelined double-data-rate
More informationIn-Place Associative Computing:
In-Place Associative Computing: A New Concept in Processor Design 1 Page Abstract 3 What s Wrong with Existing Processors? 3 Introducing the Associative Processing Unit 5 The APU Edge 5 Overview of APU
More information