3. Branch Prediction Unit¶
Branch prediction unit (BPU) is configurable and can run a simple bimodal predictor or complex 2-level adaptive predictors like GShare, GSelect, GAg, GAp, PAg, or PAp. This section describes the major structures used in BPU and their interaction with the CPU pipeline.
3.1. Branch Target Buffer¶
Branch Target Buffer (BTB) is modeled as a set-associative data structure. Supported eviction policies are random eviction and bit PLRU.
Note
For a simple bi-modal predictor, we keep the prediction bits in a separate Branch History Table (BHT).
3.2. Return Address Stack¶
BPU consists of an optional return address stack (RAS), which keeps track of the last N
function return addresses, where N
is the number of entries in RAS. The return address is pushed onto RAS from the decode pipeline stage if the decoded instruction becomes a function call. Similarly, if the decoded instruction turns out to be a function return, the address is popped from RAS and forwarded to the fetch stage.
3.3. Two-Level Adaptive Predictor¶
Level 1 of the adaptive predictor consists of a Global History Table (GHT), and level 2 consists of a Pattern History Table (PHT). Each GHT entry consists of a PC and History Register (HR) of N
bits. Each PHT entry consists of a PC and an array of \(2^N\) 2-bit saturating up-down counters. Based on the number of GHT and PHT entries, four different prediction schemes are possible, shown in the following table.
GHT entries |
PHT Entries |
Predictor |
---|---|---|
= 1 |
= 1 |
GAg |
= 1 |
> 1 |
GAp |
> 1 |
= 1 |
PAg |
> 1 |
> 1 |
PAp |
3.4. Interaction with the CPU Pipeline¶
CPU pipeline interacts with the Branch Prediction Unit in mainly four ways as follows:
Probing the branch predictor
In the Fetch stage, the branch predictor is probed using the virtual address of the instruction
If the address is present in the BTB, then probe returns hit
In case of conditional branches, for two-level adaptive predictor, address must also be present in GHT (For PAg and PAp based schemes) and PHT (For GAp and PAp based schemes)
Acquiring the target address
On branch prediction unit hit, fetch stage requests the target address from the BPU
For unconditional branches, the target is immediately returned from the BTB entry
For conditional branches, only if the prediction is taken, the target is returned from the BTB entry
Adding entries to branch predictor structures
A BTB entry is created for a branch instruction missed in the BTB when the branch is resolved.
In case of two-level adaptive predictor, required entries are also created in GHT and PHT based on the prediction scheme, for conditional branches
Updating branch predictor entries
Branch Prediction unit is probed again in memory stage (or
Int-ALU
in the out of order pipeline) after the branch resolvesIf probe misses, then the updates are skipped
If probe results in a hit, BTB entry of the branch is updated with the current target address
For the conditional branches, prediction counters are updated based on the latest branch outcome
3.5. Timings¶
Probe and target address retrieval: one cycle (overlaps with the TLB and cache probes)
Add and update entry: one cycle each (overlaps with relevant stage processing)