Skip to content

fix(smoketest): assert against served model in kustomize mode#1461

Draft
xiaojun-zhang wants to merge 1 commit into
llm-d:mainfrom
xiaojun-zhang:fix-kustomize-smoketest-served-model
Draft

fix(smoketest): assert against served model in kustomize mode#1461
xiaojun-zhang wants to merge 1 commit into
llm-d:mainfrom
xiaojun-zhang:fix-kustomize-smoketest-served-model

Conversation

@xiaojun-zhang

@xiaojun-zhang xiaojun-zhang commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

llmdbenchmark tests with Qwen3-32B for all environments (GKE, OpenShift etc), but XPU is currently deployed with smaller model.

This PR is to make the benchmark model be read from llm-d/guides so that benchmark can run with smaller model on XPU.

No need to review/merge the PR yet.

The other option is to deploy Qwen3-32B on XPU as well. If so, we won't need this PR.

Will make a decision soon.

In kustomize standup mode the smoketest read the expected model from the
benchmark scenario's model.name. But kustomize mode deploys the model
defined by the guide manifests in the llm-d repo and explicitly ignores
-m/--models and model.*. The scenario is shared across all accelerators,
while guides serve different models per accelerator (Intel XPU serves
Qwen/Qwen3-0.6B, HPU serves Qwen/Qwen3-8B), so asserting against the
scenario's Qwen/Qwen3-32B produced false health-check failures.

BaseSmoketest now auto-detects the served model from the running decode
Deployment's modelserver container args (vLLM and SGLang arg forms) and
asserts against that in kustomize mode. Standalone/modelservice/fma are
unchanged -- there the scenario remains the source of truth.

Signed-off-by: Xiaojun Zhang <robin.zhang@intel.com>
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

huaxig pushed a commit to huaxig/llm-d-benchmark that referenced this pull request Jun 22, 2026
* feat: modularize shared GKE NCCL tuner patch into Kustomize Component
Signed-off-by: Cong Liu <conliu@google.com>

* refactor: standardize GKE NCCL tuner patch overlay across all nvidia gpu model servers
Signed-off-by: Cong Liu <conliu@google.com>

* refactor: standardize optional GKE NCCL tuner overlay to gke/ and update READMEs to INFRA_PROVIDER pattern
Signed-off-by: Cong Liu <conliu@google.com>

* docs: remove GKE Tuning Patch Component README references from user guides
Signed-off-by: Cong Liu <conliu@google.com>

* refactor: consolidate lustre overlays into primary GKE connector patches
Signed-off-by: Cong Liu <conliu@google.com>

* ci: update E2E workflows to reflect new overlay paths
Signed-off-by: Cong Liu <conliu@google.com>

* ci: align all guide E2E workflows with restructured base and gke overlay paths
Signed-off-by: Cong Liu <conliu@google.com>

* Apply suggestions from code review

Minor edits.

Co-authored-by: Abdullah Gharaibeh <40361897+ahg-g@users.noreply.github.com>
Signed-off-by: Abdullah Gharaibeh <40361897+ahg-g@users.noreply.github.com>

---------

Signed-off-by: Abdullah Gharaibeh <40361897+ahg-g@users.noreply.github.com>
Co-authored-by: Abdullah Gharaibeh <40361897+ahg-g@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant