fix(smoketest): assert against served model in kustomize mode#1461
Draft
xiaojun-zhang wants to merge 1 commit into
Draft
fix(smoketest): assert against served model in kustomize mode#1461xiaojun-zhang wants to merge 1 commit into
xiaojun-zhang wants to merge 1 commit into
Conversation
In kustomize standup mode the smoketest read the expected model from the benchmark scenario's model.name. But kustomize mode deploys the model defined by the guide manifests in the llm-d repo and explicitly ignores -m/--models and model.*. The scenario is shared across all accelerators, while guides serve different models per accelerator (Intel XPU serves Qwen/Qwen3-0.6B, HPU serves Qwen/Qwen3-8B), so asserting against the scenario's Qwen/Qwen3-32B produced false health-check failures. BaseSmoketest now auto-detects the served model from the running decode Deployment's modelserver container args (vLLM and SGLang arg forms) and asserts against that in kustomize mode. Standalone/modelservice/fma are unchanged -- there the scenario remains the source of truth. Signed-off-by: Xiaojun Zhang <robin.zhang@intel.com>
Contributor
|
Unsigned commits detected! Please sign your commits. For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation. |
huaxig
pushed a commit
to huaxig/llm-d-benchmark
that referenced
this pull request
Jun 22, 2026
* feat: modularize shared GKE NCCL tuner patch into Kustomize Component Signed-off-by: Cong Liu <conliu@google.com> * refactor: standardize GKE NCCL tuner patch overlay across all nvidia gpu model servers Signed-off-by: Cong Liu <conliu@google.com> * refactor: standardize optional GKE NCCL tuner overlay to gke/ and update READMEs to INFRA_PROVIDER pattern Signed-off-by: Cong Liu <conliu@google.com> * docs: remove GKE Tuning Patch Component README references from user guides Signed-off-by: Cong Liu <conliu@google.com> * refactor: consolidate lustre overlays into primary GKE connector patches Signed-off-by: Cong Liu <conliu@google.com> * ci: update E2E workflows to reflect new overlay paths Signed-off-by: Cong Liu <conliu@google.com> * ci: align all guide E2E workflows with restructured base and gke overlay paths Signed-off-by: Cong Liu <conliu@google.com> * Apply suggestions from code review Minor edits. Co-authored-by: Abdullah Gharaibeh <40361897+ahg-g@users.noreply.github.com> Signed-off-by: Abdullah Gharaibeh <40361897+ahg-g@users.noreply.github.com> --------- Signed-off-by: Abdullah Gharaibeh <40361897+ahg-g@users.noreply.github.com> Co-authored-by: Abdullah Gharaibeh <40361897+ahg-g@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
llmdbenchmark tests with Qwen3-32B for all environments (GKE, OpenShift etc), but XPU is currently deployed with smaller model.
This PR is to make the benchmark model be read from llm-d/guides so that benchmark can run with smaller model on XPU.
No need to review/merge the PR yet.
The other option is to deploy Qwen3-32B on XPU as well. If so, we won't need this PR.
Will make a decision soon.