Systms I Piplinin I Topics Piplinin principls Piplin ovrhads Piplin ristrs and stas
Ovrviw Whatʼs s wron with th squntial (SEQ) Y86? Itʼs slow! Each pic of hardwar is usd only a small fraction of tim W would lik to find a way to t mor prformanc with only a littl mor hardwar Gnral Principls of Piplinin Goal Difficultis Cratin a Piplind Y86 Procssor arranin SEQ Insrtin piplin ristrs Problms with data and control hazards 2
al-world Piplins: Car Washs Squntial Paralll Piplind Ida Divid procss into indpndnt stas Mov objcts throuh stas in squnc At any ivn tims, multipl objcts bin procssd 3
Laundry xampl Ann, Brian, Cathy, Dav ach hav on load of cloths to wash, dry, and fold Washr taks 30 minuts A B C D Dryr taks 30 minuts Foldr taks 30 minuts Stashr taks 30 minuts to put cloths into drawrs Slid courtsy of D. Pattrson 4
Squntial Laundry 6 PM 7 8 9 10 11 12 1 2 AM T a s k O r d r A B C D 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 Tim Squntial laundry taks 8 hours for 4 loads If thy larnd piplinin, how lon would laundry tak? Slid courtsy of D. Pattrson 5
Piplind Laundry: Start ASAP T a s k 12 2 AM 6 PM 7 8 9 10 11 1 A B C 30 30 30 30 30 30 30 Tim O D r d r Piplind laundry taks 3.5 hours for 4 loads! Slid courtsy of D. Pattrson 6
T a s k O r d r Piplinin Lssons 6 PM 7 8 9 Tim 30 30 30 30 30 30 30 A B C D Slid courtsy of D. Pattrson Piplinin dosnʼt t hlp latncy of sinl task, it hlps throuhput of ntir workload Multipl tasks opratin simultanously usin diffrnt rsourcs Potntial spdup = Numbr pip stas Piplin rat limitd by slowst piplin sta Unbalancd lnths of pip stas rducs spdup Tim to filll piplin and tim to drain it rducs spdup Stall for Dpndncs 7
Computational Exampl 300 ps 20 ps Combinational loic Dlay = 320 ps Throuhput = 3.12 GOPS Clock Systm Computation rquirs total of 300 picosconds Additional 20 picosconds to sav rsult in ristr Must hav clock cycl of at last 320 ps 8
3-Way Piplind Vrsion 100 ps 20 ps 100 ps 20 ps 100 ps 20 ps loic A loic B loic C Dlay = 360 ps Throuhput = 8.33 GOPS Systm Divid combinational loic into 3 blocks of 100 ps ach Can bin nw opration as soon as prvious on passs throuh sta A. Bin nw opration vry 120 ps Ovrall latncy incrass 360 ps from start to finish Clock 9
Piplin Diarams Unpiplind OP1 OP2 OP3 Tim Cannot start nw opration until prvious on complts 3-Way Piplind OP1 OP2 OP3 A B C A B C A B C Tim Up to 3 oprations in procss simultanously 10
Opratin a Piplin Clock 239 241 300 359 OP1 OP2 OP3 A B C A B C A B C 0 120 240 360 480 640 Tim 100 ps 20 ps 100 ps 20 ps 100 ps 20 ps loic A loic B loic C Clock 11
Limitations: Nonuniform Dlays 50 ps 20 ps 150 ps 20 ps 100 ps 20 ps loic A loic B loic C Dlay = 510 ps Throuhput = 5.88 GOPS OP1 OP2 OP3 Clock A B C A B C A B C Tim Throuhput limitd by slowst sta Othr stas sit idl for much of th tim Challnin to partition systm into balancd stas 12
Limitations: istr Ovrhad 50 ps 20 ps 50 ps 20 ps 50 ps 20 ps 50 ps 20 ps 50 ps 20 ps 50 ps 20 ps loic loic loic loic loic loic Clock Dlay = 420 ps, Throuhput = 14.29 GOPS As try to dpn piplin, ovrhad of loadin ristrs bcoms mor sinificant Prcnta of clock cycl spnt loadin ristr: 1-sta piplin: 6.25% 3-sta piplin: 16.67% 6-sta piplin: 28.57% Hih spds of modrn procssor dsins obtaind throuh vry dp piplinin 13
visitin th Prformanc Eqn Sconds Instructions Cycls CPU tim = =!! Proram Proram Instruction Sconds Cycl Instruction Count: No chan Clock Cycl Tim Improvs by factor of almost N for N-dp piplin Not quit factor of N du to piplin ovrhads Cycls Pr Instruction In idal world, CPI would stay th sam An individual instruction taks N cycls But w hav N instructions in fliht at a tim So - avra CPI pip = CPI no_pip * N/N Thus prformanc can improv by up to factor of N 14
Data Dpndncis Combinational loic Clock OP1 OP2 OP3 Tim Systm Each opration dpnds on rsult from prcdin on 15
Data Hazards loic A loic B loic C OP1 A B C OP2 A B C OP3 A B C OP4 A B C Tim Clock sult dos not fd back around in tim for nxt opration Piplinin has chand bhavior of systm 16
Data Dpndncis in Procssors 1 irmovl $50, %ax 2 addl %ax, %bx 3 mrmovl 100( %bx ), %dx sult from on instruction usd as oprand for anothr ad-aftr-writ (AW) dpndncy Vry common in actual prorams Must mak sur our piplin handls ths proprly Gt corrct rsults Minimiz prformanc impact 17
SEQ Hardwar nw Nw Stas occur in squnc On opration in procss at a tim Mmory Mm. control rad writ Data mmory Addr valm data out Data On sta for ach loical piplin opration Ftch (t nxt instruction from mmory) Dcod (fiur out what instruction dos and t valus from rfil) Excut (comput) Mmory (accss data mmory if ncssary) Writ back (writ any instruction rsult to rfil) Excut Dcod Ftch Bch CC icod ifun ra ALU A Instruction mmory rb vale ALU valc ALU B valp vala incrmnt ALU fun. valb A B M istr fil E dste dstm srca srcb dste dstm srca srcb Writ back 18
SEQ+ Hardwar Mmory Mm. control rad writ valm data out Data mmory Still squntial implmntation ordr sta to put at binnin Excut Bch CC ALU A vale ALU Addr ALU B Data ALU fun. Sta Task is to slct for currnt instruction vala valb dste dstm srca srcb dste dstm srca srcb Basd on rsults computd by prvious instruction Dcod icod ifun ra rb valc valp A B M istr fil E Writ back Procssor Stat Ftch Instruction mmory incrmnt is no lonr stord in ristr But, can dtrmin basd on othr stord information picod pbch pvalm pvalc pvalp 19
Addin Piplin istrs vale, valm W_icod, W_valM Writ back valm valm W valm W_valE, W_valM, W_dstE, W_dstM Mmory Data mmory Addr, Data Mmory M_icod, M_Bch, M_valA Data mmory Addr, Data vale M Excut Bch CC ALU Excut Bch CC vale ALU alua, alub alua, alub vala, valb E Dcod Ftch icod, valc valp icod, ifun ra, rb valc Instruction mmory srca, srcb dsta, dstb valp incrmnt A B M istr fil E Dcod D icod, ifun, ra, rb, valc d_srca, d_srcb vala, valb A B istr M fil E valp valp Writ back Ftch Instruction mmory incrmnt prd pstat f_ F 20
Piplin Stas Ftch Slct currnt ad instruction Comput incrmntd Dcod ad proram ristrs Excut Oprat ALU Mmory ad or writ data mmory Writ Back Updat ristr fil Mmory Excut Dcod Ftch W_icod, W_valM M_icod, M_Bch, M_valA W M E D icod, ifun, ra, rb, valc F Instruction mmory Addr, Data Bch d_srca, d_srcb valm CC alua, alub Data mmory valp W_valE, W_valM, W_dstE, W_dstM vale ALU vala, valb incrmnt f_ A B istr M fil E valp Writ back prd 21
Summary Today Piplinin principls (assmbly lin) Ovrhads du to imprfct piplinin Brakin instruction xcution into squnc of stas Nxt Tim Piplinin hardwar: ristrs and fdback paths Difficultis with piplins: hazards Mthod of mitiatin hazards 22