Validate capacity planner accuracy and compile extensive report with next steps #205
Open
jgchn wants to merge 25 commits into
Open
Validate capacity planner accuracy and compile extensive report with next steps #205jgchn wants to merge 25 commits into
jgchn wants to merge 25 commits into
Conversation
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
namasl
reviewed
Apr 24, 2026
|
|
||
| ### The headline: accurate where it counts most | ||
|
|
||
| **Weight memory: 0.89% mean absolute error** across 53 of the 57 runs. (The remaining 4 used parameters the planner doesn't yet model, float32 dtype and runtime fp8 quantization, and are discussed below.) This is the single largest memory component; for a model like Llama-3.1-8B at TP=1, weights consume about 15 GiB of the 79 GiB available. It's also the hardest to get right across a diverse model set. |
Collaborator
There was a problem hiding this comment.
When using planner, will I know if I enter some setup or model that planner doesn't fully support (will it give me some sort of error message, or proceed to give me results with large errors without warning)?
namasl
reviewed
Apr 24, 2026
namasl
reviewed
Apr 24, 2026
namasl
reviewed
Apr 24, 2026
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
results Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Addresses #194
How Has This Been Tested?
Does not affect current UI or API workflow. No effect on llm-d-planner functionality
Merge criteria:
Next steps
--dtype,--kv-cache-dtype, and--quantizationoverride in UI and APIfind_possible_tps(): TP must also be divisible by vocab size.vocab_sizeis usually powers of 2 times a multiplier