Align LLBP-X provider and enable in kmhv3#919
Conversation
Change-Id: I08c3756fcc972e6d4170eb7fff4dee654a2320a4
Change-Id: Ided6257af49a9bdbddff49ba29a8a1d2d53e8942
📝 WalkthroughWalkthroughIntroduces ChangesBTBLLBPX BTB Direction-Override Predictor
Sequence Diagram(s)sequenceDiagram
participant DecoupledBPU as DecoupledBPUWithBTB
participant BTBLLBPX
participant ContextStore as SetAssociativeStore[Context]
participant PatternStore as SetAssociativeStore[Pattern]
participant PatternBuffer
DecoupledBPU->>BTBLLBPX: putPCHistory(startAddr, history, stagePreds)
loop each BTB entry in stagePreds
BTBLLBPX->>BTBLLBPX: compute contextKey / patternKey
BTBLLBPX->>ContextStore: find(contextKey, contextTag)
ContextStore-->>BTBLLBPX: ContextEntry* (hit) or nullptr
BTBLLBPX->>PatternStore: find(patternKey, patternTag)
PatternStore-->>BTBLLBPX: PatternEntry* (hit) or nullptr
alt enableTiming
BTBLLBPX->>PatternBuffer: findPatternBuffer(key, tag)
PatternBuffer-->>BTBLLBPX: readyTick check result
end
BTBLLBPX->>BTBLLBPX: apply depth/timing eligibility gates
BTBLLBPX->>DecoupledBPU: override direction in stagePreds (if eligible)
end
BTBLLBPX->>BTBLLBPX: store LLBPXMeta in ThreadState
DecoupledBPU->>BTBLLBPX: update(FetchTarget)
BTBLLBPX->>BTBLLBPX: locate BranchMeta per BTB entry
alt provider was used
BTBLLBPX->>PatternStore: updatePattern (saturating counter)
BTBLLBPX->>PatternBuffer: rememberPattern (dirty/readyTick)
else misprediction or missing context/pattern
BTBLLBPX->>ContextStore: allocate(contextKey, contextTag)
BTBLLBPX->>PatternStore: allocate(patternKey, patternTag)
BTBLLBPX->>PatternBuffer: rememberPattern
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR introduces a new BTB-side direction override component (LLBP-X) into the decoupled BTB branch predictor stack, extends the predictor metadata capacity to accommodate the added component, and enables LLBP-X in the XiangShan kmhv3 configuration (with an optional timing gate).
Changes:
- Add the
BTBLLBPXpredictor component (C++ implementation + SimObject params) and integrate it intoDecoupledBPUWithBTB. - Expand per-stream predictor metadata capacity (and add a component-count guard) to support additional BTB predictor components.
- Enable LLBP-X in
configs/example/kmhv3.pyand add CLI options related to LLBP-X timing.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/cpu/pred/SConscript |
Registers BTBLLBPX SimObject, builds btb_llbpx.cc, and adds LLBPX debug flag. |
src/cpu/pred/btb/llbpx_cache.hh |
Adds a small set-associative store utility used by LLBP-X. |
src/cpu/pred/btb/decoupled_bpred.hh |
Wires BTBLLBPX into the decoupled BTB predictor composition. |
src/cpu/pred/btb/decoupled_bpred.cc |
Adds LLBP-X to component list, introduces component-count guard, and stores per-component metadata. |
src/cpu/pred/btb/common.hh |
Extends TageInfoForMGSC provider fields; increases FetchTarget::predMetas capacity via MaxBTBPredComponents. |
src/cpu/pred/btb/btb_tage.cc |
Populates new provider-related fields for downstream consumers (e.g., LLBP-X). |
src/cpu/pred/btb/btb_llbpx.hh |
Declares the new BTBLLBPX predictor component. |
src/cpu/pred/btb/btb_llbpx.cc |
Implements LLBP-X lookup/override, timing gate, recovery, and update/allocation logic. |
src/cpu/pred/BranchPredictor.py |
Adds the BTBLLBPX SimObject and plugs it into DecoupledBPUWithBTB params. |
configs/example/kmhv3.py |
Enables LLBP-X by default in kmhv3 and optionally enables its timing gate via CLI. |
configs/common/xiangshan.py |
Adds CLI flags related to LLBP-X enable/timing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| parser.add_argument( | ||
| "--enable-llbpx", | ||
| "--enable-llbp", | ||
| action="store_true", | ||
| default=False, | ||
| help="Enable the experimental BTB-side LLBP-X direction override component", | ||
| ) | ||
| parser.add_argument( | ||
| "--enable-llbpx-timing", | ||
| "--enable-llbp-timing", | ||
| action="store_true", | ||
| default=False, | ||
| help="Enable the LLBP-X Pattern Buffer timing gate", | ||
| ) |
| BTBLLBPX::putPCHistory(Addr startAddr, | ||
| const boost::dynamic_bitset<> &history, | ||
| std::vector<FullBTBPrediction> &stagePreds) | ||
| { | ||
| if (stagePreds.empty()) { |
| panic_if(numComponents > MaxBTBPredComponents, | ||
| "BTB predictor component count %u exceeds predMetas capacity %u", | ||
| numComponents, MaxBTBPredComponents); |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@configs/example/kmhv3.py`:
- Around line 113-115: The kmhv3 branch predictor setup currently hard-codes
llbpx on, so the new CLI toggle is ignored. Update the predictor initialization
in the kmhv3 config so `cpu.branchPred.llbpx.enabled` is driven by the parsed
args flag (the same one used for `enable_llbpx_timing`), and keep the timing
option tied to that same `args` value in the `cpu.branchPred.llbpx` block.
In `@src/cpu/pred/btb/btb_llbpx.cc`:
- Around line 423-434: Snapshot the old prediction in BTBLLBPX::updatePattern
before calling updateCounter, because pattern->taken() is currently read after
the counter is modified and can misclassify just-corrected weak states as
correct. Store the pre-update taken/not-taken result in a local variable, use
that saved value for the confidence adjustment, and keep the rest of the logic
in updatePattern and replacementScore unchanged.
- Around line 141-145: The LLBPX gating in lookupHelper() is using the wrong
TAGE depth field, so it may reject valid overrides when TAGE falls back from the
main provider to alt/base. Update the base provider depth lookup in btb_llbpx.cc
to read tage_final_provider_table from stagePred.tageInfoForMgscs instead of
tage_provider_table, keeping the existing branchPC lookup and
providerDepthEligible comparison intact.
- Around line 347-350: The recoverPHist() override in BTBLLBPX currently does
not restore threadState[tid].rcr, leaving the recent-control-record state
speculative after a squash when path-history recovery is used. Update
recoverPHist() to mirror the RCR restoration done in recoverHist(), using the
provided history/entry/update data so threadState[tid].rcr is brought back to
the correct committed state before the next contextKey() call.
- Around line 82-86: The BTBLLBPX initialization currently validates tagBits,
keyBits, and patternBufferSize but never checks counterBits, so invalid values
can break updateCounter() with an illegal shift or overflow the short saturation
range. Add a panic_if guard near the existing constructor validation in BTBLLBPX
to require counterBits be within a safe positive bound before any call sites
that rely on updateCounter(), and make sure the check is placed alongside the
other parameter validations in btb_llbpx.cc.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6e323617-b9b2-4ceb-95bd-dd48e57b2ae1
📒 Files selected for processing (11)
configs/common/xiangshan.pyconfigs/example/kmhv3.pysrc/cpu/pred/BranchPredictor.pysrc/cpu/pred/SConscriptsrc/cpu/pred/btb/btb_llbpx.ccsrc/cpu/pred/btb/btb_llbpx.hhsrc/cpu/pred/btb/btb_tage.ccsrc/cpu/pred/btb/common.hhsrc/cpu/pred/btb/decoupled_bpred.ccsrc/cpu/pred/btb/decoupled_bpred.hhsrc/cpu/pred/btb/llbpx_cache.hh
| cpu.branchPred.llbpx.enabled = True | ||
| cpu.branchPred.llbpx.enableTiming = bool( | ||
| getattr(args, 'enable_llbpx_timing', False)) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Wire --enable-llbpx into the kmhv3 predictor toggle.
Line 113 hard-codes cpu.branchPred.llbpx.enabled = True, so the new --enable-llbpx/--enable-llbp flag has no effect.
Suggested fix
- cpu.branchPred.llbpx.enabled = True
+ cpu.branchPred.llbpx.enabled = bool(
+ getattr(args, 'enable_llbpx', False))
cpu.branchPred.llbpx.enableTiming = bool(
getattr(args, 'enable_llbpx_timing', False))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cpu.branchPred.llbpx.enabled = True | |
| cpu.branchPred.llbpx.enableTiming = bool( | |
| getattr(args, 'enable_llbpx_timing', False)) | |
| cpu.branchPred.llbpx.enabled = bool( | |
| getattr(args, 'enable_llbpx', False)) | |
| cpu.branchPred.llbpx.enableTiming = bool( | |
| getattr(args, 'enable_llbpx_timing', False)) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@configs/example/kmhv3.py` around lines 113 - 115, The kmhv3 branch predictor
setup currently hard-codes llbpx on, so the new CLI toggle is ignored. Update
the predictor initialization in the kmhv3 config so
`cpu.branchPred.llbpx.enabled` is driven by the parsed args flag (the same one
used for `enable_llbpx_timing`), and keep the timing option tied to that same
`args` value in the `cpu.branchPred.llbpx` block.
| panic_if(tagBits == 0 || tagBits >= 64, "BTBLLBPX tagBits must be in [1, 63]"); | ||
| panic_if(keyBits == 0 || keyBits >= 64, "BTBLLBPX keyBits must be in [1, 63]"); | ||
| panic_if(enableTiming && patternBufferSize == 0, | ||
| "BTBLLBPX timing mode requires a non-zero patternBufferSize"); | ||
| warn_if(adaptCtxDepth, "BTBLLBPX Phase 1 ignores adaptive context depth"); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Inspect BTBLLBPX parameter definitions and uses:"
fd -a 'BranchPredictor.py|btb_llbpx.hh|btb_llbpx.cc' .
echo
echo "Search for counterBits declarations/defaults/validation:"
rg -n -C3 '\bcounterBits\b|BTBLLBPX' src configsRepository: OpenXiangShan/GEM5
Length of output: 19375
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "BTBLLBPX header (parameter/member definitions):"
sed -n '180,230p' src/cpu/pred/btb/btb_llbpx.hh
echo
echo "BTBLLBPX constructor and counter update implementation:"
sed -n '65,90p' src/cpu/pred/btb/btb_llbpx.cc
echo
sed -n '549,565p' src/cpu/pred/btb/btb_llbpx.cc
echo
echo "BTBLLBPX Python params:"
sed -n '1122,1145p' src/cpu/pred/BranchPredictor.pyRepository: OpenXiangShan/GEM5
Length of output: 4879
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python3 - <<'PY'
# Read-only probe of the shift expression's edge cases.
# This mirrors the source expression logically: max = (1 << (counterBits - 1)) - 1
# and min = -(1 << (counterBits - 1))
for bits in [0, 1, 2, 3, 31, 32, 63, 64]:
try:
v = 1 << (bits - 1)
print(bits, v.bit_length())
except Exception as e:
print(bits, type(e).__name__, e)
PYRepository: OpenXiangShan/GEM5
Length of output: 226
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Search for any validation or assumptions on BTBLLBPX counterBits:"
rg -n -C2 'counterBits|Pattern counter width|panic_if|fatal_if|warn_if' src/cpu/pred/btb src/cpu/pred/BranchPredictor.py
echo
echo "Inspect the generated param type declaration, if present:"
fd -a 'BTBLLBPX.hh|BTBLLBPX.py|BTBLLBPX.*' build src params 2>/dev/null || trueRepository: OpenXiangShan/GEM5
Length of output: 5541
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python3 - <<'PY'
# Probe the exact problematic ranges for the source expression.
# We keep this read-only and do not import or execute repository code.
def expr(bits):
return (1 << (bits - 1)) - 1, -(1 << (bits - 1))
for bits in [0, 1, 2, 3, 15, 16, 17, 31, 32, 33]:
try:
print(bits, expr(bits))
except Exception as e:
print(bits, type(e).__name__, e)
PYRepository: OpenXiangShan/GEM5
Length of output: 364
Guard counterBits before updateCounter()
counterBits has no validation. 0 makes (1 << (counterBits - 1)) a negative shift, and larger values overflow the short saturation bounds.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/cpu/pred/btb/btb_llbpx.cc` around lines 82 - 86, The BTBLLBPX
initialization currently validates tagBits, keyBits, and patternBufferSize but
never checks counterBits, so invalid values can break updateCounter() with an
illegal shift or overflow the short saturation range. Add a panic_if guard near
the existing constructor validation in BTBLLBPX to require counterBits be within
a safe positive bound before any call sites that rely on updateCounter(), and
make sure the check is placed alongside the other parameter validations in
btb_llbpx.cc.
Source: Linters/SAST tools
| int baseProviderHistIdx = -2; | ||
| auto tageInfoIt = stagePred.tageInfoForMgscs.find(branchPC); | ||
| if (tageInfoIt != stagePred.tageInfoForMgscs.end()) { | ||
| baseProviderHistIdx = tageInfoIt->second.tage_provider_table; | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Use TAGE’s final provider depth here, not the main provider depth.
lookupHelper() now publishes both tage_provider_table and tage_final_provider_table, but this path still gates LLBPX against tage_provider_table. When TAGE falls back to alt/base, Line 144 makes providerDepthEligible compare against the wrong source and can reject overrides that should be legal.
Suggested fix
- int baseProviderHistIdx = -2;
+ int baseProviderHistIdx = -2;
auto tageInfoIt = stagePred.tageInfoForMgscs.find(branchPC);
if (tageInfoIt != stagePred.tageInfoForMgscs.end()) {
- baseProviderHistIdx = tageInfoIt->second.tage_provider_table;
+ baseProviderHistIdx =
+ tageInfoIt->second.tage_final_provider_table;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| int baseProviderHistIdx = -2; | |
| auto tageInfoIt = stagePred.tageInfoForMgscs.find(branchPC); | |
| if (tageInfoIt != stagePred.tageInfoForMgscs.end()) { | |
| baseProviderHistIdx = tageInfoIt->second.tage_provider_table; | |
| } | |
| int baseProviderHistIdx = -2; | |
| auto tageInfoIt = stagePred.tageInfoForMgscs.find(branchPC); | |
| if (tageInfoIt != stagePred.tageInfoForMgscs.end()) { | |
| baseProviderHistIdx = | |
| tageInfoIt->second.tage_final_provider_table; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/cpu/pred/btb/btb_llbpx.cc` around lines 141 - 145, The LLBPX gating in
lookupHelper() is using the wrong TAGE depth field, so it may reject valid
overrides when TAGE falls back from the main provider to alt/base. Update the
base provider depth lookup in btb_llbpx.cc to read tage_final_provider_table
from stagePred.tageInfoForMgscs instead of tage_provider_table, keeping the
existing branchPC lookup and providerDepthEligible comparison intact.
| BTBLLBPX::recoverPHist(const boost::dynamic_bitset<> &history, | ||
| const FetchTarget &entry, | ||
| const PathHistoryUpdate &update) | ||
| { |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Recover the RCR in path-history mode too.
This empty override leaves threadState[tid].rcr on speculative state after a squash when recovery flows through recoverPHist() instead of recoverHist(). The next contextKey() then hashes the wrong recent-control record sequence.
Suggested fix
void
BTBLLBPX::recoverPHist(const boost::dynamic_bitset<> &history,
const FetchTarget &entry,
const PathHistoryUpdate &update)
{
+ auto meta = std::static_pointer_cast<LLBPXMeta>(
+ entry.predMetas[getComponentIdx()]);
+ if (!meta || entry.tid >= threadState.size()) {
+ return;
+ }
+
+ restoreRCR(entry.tid, *meta);
+ if (entry.exeTaken) {
+ pushRCR(entry.tid, entry.exeBranchInfo, true);
+ }
+#ifndef UNIT_TEST
+ llbpxStats.rcrRecover++;
+#endif
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| BTBLLBPX::recoverPHist(const boost::dynamic_bitset<> &history, | |
| const FetchTarget &entry, | |
| const PathHistoryUpdate &update) | |
| { | |
| BTBLLBPX::recoverPHist(const boost::dynamic_bitset<> &history, | |
| const FetchTarget &entry, | |
| const PathHistoryUpdate &update) | |
| { | |
| auto meta = std::static_pointer_cast<LLBPXMeta>( | |
| entry.predMetas[getComponentIdx()]); | |
| if (!meta || entry.tid >= threadState.size()) { | |
| return; | |
| } | |
| restoreRCR(entry.tid, *meta); | |
| if (entry.exeTaken) { | |
| pushRCR(entry.tid, entry.exeBranchInfo, true); | |
| } | |
| `#ifndef` UNIT_TEST | |
| llbpxStats.rcrRecover++; | |
| `#endif` | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/cpu/pred/btb/btb_llbpx.cc` around lines 347 - 350, The recoverPHist()
override in BTBLLBPX currently does not restore threadState[tid].rcr, leaving
the recent-control-record state speculative after a squash when path-history
recovery is used. Update recoverPHist() to mirror the RCR restoration done in
recoverHist(), using the provided history/entry/update data so
threadState[tid].rcr is brought back to the correct committed state before the
next contextKey() call.
| BTBLLBPX::updatePattern(const BranchMeta &meta, bool actualTaken) | ||
| { | ||
| auto *pattern = patterns.find(meta.key, meta.patternTag); | ||
| if (!pattern) { | ||
| return; | ||
| } | ||
| updateCounter(actualTaken, pattern->counter); | ||
| if (pattern->taken() == actualTaken && pattern->confidence < 15) { | ||
| pattern->confidence++; | ||
| } else if (pattern->confidence > 0) { | ||
| pattern->confidence--; | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Snapshot the old prediction before updating the counter.
Line 430 checks pattern->taken() after updateCounter(). A weak counter that was wrong and just crossed zero will be treated as “correct” and gain confidence, which skews replacementScore() toward recently corrected mistakes.
Suggested fix
void
BTBLLBPX::updatePattern(const BranchMeta &meta, bool actualTaken)
{
auto *pattern = patterns.find(meta.key, meta.patternTag);
if (!pattern) {
return;
}
+ const bool predictedTaken = pattern->taken();
updateCounter(actualTaken, pattern->counter);
- if (pattern->taken() == actualTaken && pattern->confidence < 15) {
+ if (predictedTaken == actualTaken && pattern->confidence < 15) {
pattern->confidence++;
} else if (pattern->confidence > 0) {
pattern->confidence--;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| BTBLLBPX::updatePattern(const BranchMeta &meta, bool actualTaken) | |
| { | |
| auto *pattern = patterns.find(meta.key, meta.patternTag); | |
| if (!pattern) { | |
| return; | |
| } | |
| updateCounter(actualTaken, pattern->counter); | |
| if (pattern->taken() == actualTaken && pattern->confidence < 15) { | |
| pattern->confidence++; | |
| } else if (pattern->confidence > 0) { | |
| pattern->confidence--; | |
| } | |
| BTBLLBPX::updatePattern(const BranchMeta &meta, bool actualTaken) | |
| { | |
| auto *pattern = patterns.find(meta.key, meta.patternTag); | |
| if (!pattern) { | |
| return; | |
| } | |
| const bool predictedTaken = pattern->taken(); | |
| updateCounter(actualTaken, pattern->counter); | |
| if (predictedTaken == actualTaken && pattern->confidence < 15) { | |
| pattern->confidence++; | |
| } else if (pattern->confidence > 0) { | |
| pattern->confidence--; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/cpu/pred/btb/btb_llbpx.cc` around lines 423 - 434, Snapshot the old
prediction in BTBLLBPX::updatePattern before calling updateCounter, because
pattern->taken() is currently read after the counter is modified and can
misclassify just-corrected weak states as correct. Store the pre-update
taken/not-taken result in a local variable, use that saved value for the
confidence adjustment, and keep the rest of the logic in updatePattern and
replacementScore unchanged.
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
[Generated by GEM5 Performance Robot] Align BTB PerformanceOverall Score
[Generated by GEM5 Performance Robot] Align BTB PerformanceOverall Score
|
This branch adds an experimental BTB-side LLBP-X direction override component to the XiangShan kmhv3 branch predictor flow, and enables it by default in
kmhv3.py.The implementation is inspired by the upstream LLBP-X project:
This is not a direct patch import. The upstream gem5 integration targets a different predictor structure, while this repository uses the
DecoupledBPUWithBTBcomponent chain. This branch ports the main LLBP-X ideas into the local BTB predictor framework.Summary by CodeRabbit
New Features
Bug Fixes