3. Branch Prediction Unit

Branch prediction unit (BPU) is configurable and can run a simple bimodal predictor or complex 2-level adaptive predictors like GShare, GSelect, GAg, GAp, PAg, or PAp. This section describes the major structures used in BPU and their interaction with the CPU pipeline.

../_images/bpu.svg

BPU High-level Overview

3.1. Branch Target Buffer

Branch Target Buffer (BTB) is modeled as a set-associative data structure. Supported eviction policies are random eviction and bit PLRU.

Note

For a simple bi-modal predictor, we keep the prediction bits in a separate Branch History Table (BHT).

3.2. Return Address Stack

BPU consists of an optional return address stack (RAS), which keeps track of the last N function return addresses, where N is the number of entries in RAS. The return address is pushed onto RAS from the decode pipeline stage if the decoded instruction becomes a function call. Similarly, if the decoded instruction turns out to be a function return, the address is popped from RAS and forwarded to the fetch stage.

3.3. Two-Level Adaptive Predictor

Level 1 of the adaptive predictor consists of a Global History Table (GHT), and level 2 consists of a Pattern History Table (PHT). Each GHT entry consists of a PC and History Register (HR) of N bits. Each PHT entry consists of a PC and an array of \(2^N\) 2-bit saturating up-down counters. Based on the number of GHT and PHT entries, four different prediction schemes are possible, shown in the following table.

GHT entries

PHT Entries

Predictor

= 1

= 1

GAg

= 1

> 1

GAp

> 1

= 1

PAg

> 1

> 1

PAp

3.4. Interaction with the CPU Pipeline

CPU pipeline interacts with the Branch Prediction Unit in mainly four ways as follows:

  • Probing the branch predictor

    • In the Fetch stage, the branch predictor is probed using the virtual address of the instruction

    • If the address is present in the BTB, then probe returns hit

    • In case of conditional branches, for two-level adaptive predictor, address must also be present in GHT (For PAg and PAp based schemes) and PHT (For GAp and PAp based schemes)

  • Acquiring the target address

    • On branch prediction unit hit, fetch stage requests the target address from the BPU

    • For unconditional branches, the target is immediately returned from the BTB entry

    • For conditional branches, only if the prediction is taken, the target is returned from the BTB entry

  • Adding entries to branch predictor structures

    • A BTB entry is created for a branch instruction missed in the BTB when the branch is resolved.

    • In case of two-level adaptive predictor, required entries are also created in GHT and PHT based on the prediction scheme, for conditional branches

  • Updating branch predictor entries

    • Branch Prediction unit is probed again in memory stage (or Int-ALU in the out of order pipeline) after the branch resolves

    • If probe misses, then the updates are skipped

    • If probe results in a hit, BTB entry of the branch is updated with the current target address

    • For the conditional branches, prediction counters are updated based on the latest branch outcome

3.5. Timings

  • Probe and target address retrieval: one cycle (overlaps with the TLB and cache probes)

  • Add and update entry: one cycle each (overlaps with relevant stage processing)