Out-of-order Pipeline Register Read When do instructions read the register file? Fetch Decode Rename Dispatch Buffer of instructions Issue Reg-read Execute Writeback Commit Option #: after select, right before execute (Not done at decode) Read physical register (renamed) Or get value via bypassing (based on physical register name) This is Pentium 4, MIPS R0k, Alpha 64 style Physical register file may be large Multi-cycle read In-order front end Out-of-order execution Option #: as part of issue, keep values in Issue Queue Pentium Pro, Core, Core i7 43 OOO execution (-wide) OOO execution (-wide) RDY RDY i p 7 0 0 0 0 RDY i RDY p5, p p 7 0 0 0 0 48 49 OOO execution (-wide) OOO execution (-wide) i, p4 i, p 7 0 0 0 0 7, 3 6 3 i p 7 0 0 0 0 _, 9 i _, 4 3 50 5
OOO execution (-wide) OOO execution (-wide) p 7 p 7 i 4 0 3 0 3 4 i 4 3 3 4 5 53 Note similarity to in-order OOO execution (-wide) p 7 4 3 3 4 Multi-cycle operations Multi-cycle ops (load, fp, multiply, etc.) Wakeup deferred a few cycles Structural hazard? Cache misses? Speculative wake-up (assume hit) Cancel exec of dependents Re-issue later Details: complicated, not important 54 55 Re-order Buffer (ROB) All instructions in order Two purposes Misprediction recovery In-order commit Maintain appearance of in-order execution Freeing of physical registers RENAMING REVISITED 56 57
Renaming revisited Overwritten register Freed at commit Restore in map table on recovery Branch mis-prediction recovery Also must be read at rename Original insns r,r r3 r p r3 p3 58 59 r,r r3 p, p r,r r3 r p r3 p3 r p r3 60 6 r,r r3, p4 r,r r3 r p r3 r p r3 r4 6 63
r,r r3 p5, p r,r r3 p5, p r p r3 r4 r p r3 r4 64 65 r,r r3 p5, p i, [p] r,r r3 p5, p i, [p] r p r3 r4 r r3 r4 66 67 ROB ROB entry holds all info for recover/commit Logical register names Physical register names Instruction types Dispatch: insert at tail Full? Stall Commit: remove from head Not completed? Stall Recovery Completely remove wrong path instructions Flush from IQ Remove from ROB Restore map table to before misprediction Free destination registers 68 69
bnz r loop bnz p, loop r, r r3 r3, r4 r4 r5, r r3 i r3, r p5, p i, [p] bnz r loop bnz p, loop r, r r3 r3, r4 r4 r5, r r3 i r3, r p5, p i, [p] r r3 r4 r p r3 r4 70 7 bnz r loop bnz p, loop r, r r3 r3, r4 r4 r5, r r3 p5, p bnz r loop bnz p, loop r, r r3 r3, r4 r4 r p r3 r4 r p r3 7 73 bnz r loop r, r r3 bnz p, loop bnz r loop bnz p, loop r p r3 p3 r p r3 p3 74 75
What about stores Stores: Write D$, not registers Can we rename memory? Recover in the cache? No (at least not easily) Cache writes unrecoverable Stores: only when certain Commit Commit r, r r3 r3, r4 r4 r5, r r3 i r3, r p5, p i, [p] At commit: instruction becomes architected state In order Only when instructions are finished Free overwritten register (why?) 76 77 r,r r3 Freeing over-written register p5, p i, [p] Commit Example r,r r3 p5, p i, [p] Before : r3 p3 After : r3 Insns older than reads p3 Insns younger than read (until next r3-writing instruction) At commit of, no older instructions exist No one else needs p3 free it! r r3 r4 78 79 Commit Example r,r r3 p5, p i, [p] Commit Example p5, p i, [p] r r r3 r4 r5 p p5 p3 r r r3 r4 r5 p p5 p3 p4 80 8
Commit Example Commit Example p5, p i, [p] i, [p] r r r3 r4 r5 p p5 p3 p4 r r r3 r4 r5 p p5 p3 p4 p 8 83 Standard style: large and cumbersome Change layout slightly Columns = stages (dispatch, issue, etc.) Rows = instructions Content of boxes = cycles For our purposes: issue/exec = cycle Ignore preg read latency, etc. Load-use, mul, div, and FP longer ld [p] p ld [] Buffer of instructions Fetch Decode Rename Dispatch Issue Reg-read Execute Writeback Commit 84 85 ld [p] p ld [] ld [p] p ld [] -wide Infinite ROB, IQ, Pregs Loads: 3 cycles Cycle : Dispatch and ld 86 87
ld [p] p 5 ld [p] p 5 ld [] ld [] 3 6 Cycle : Dispatch and ld st Ld issues -- also note WB cycle while you do this (Note: don t issue if WB ports full) Cycle 3: and are not ready nd load is issue it 88 89 ld [p] p 5 ld [p] p 5 6 5 6 5 6 6 7 ld [] 3 6 ld [] 3 6 Cycle 4: nothing Cycle 5: can issue Cycle 6: st load can commit (oldest instruction & finished) can issue 90 9 ld [p] p 5 6 ld [p] p 5 6 5 6 7 5 6 7 6 7 6 7 8 ld [] 3 6 ld [] 3 6 8 Cycle 7: can commit (oldest instruction & finished) Cycle 8: and ld can commit (-wide: can do both at once) 9 93
ld [p] p ld [] 5 5 6 6 7 3 6 6 7 8 8 Buffer of instructions Fetch Decode Rename Dispatch Issue Reg-read Execute Writeback Commit 94