fix(smoketest): assert against served model in kustomize mode by xiaojun-zhang · Pull Request #1461 · llm-d/llm-d-benchmark

xiaojun-zhang · 2026-06-05T02:37:56Z

llmdbenchmark tests with Qwen3-32B for all environments (GKE, OpenShift etc), but XPU is currently deployed with smaller model.

This PR is to make the benchmark model be read from llm-d/guides so that benchmark can run with smaller model on XPU.

No need to review/merge the PR yet.

The other option is to deploy Qwen3-32B on XPU as well. If so, we won't need this PR.

Will make a decision soon.

In kustomize standup mode the smoketest read the expected model from the benchmark scenario's model.name. But kustomize mode deploys the model defined by the guide manifests in the llm-d repo and explicitly ignores -m/--models and model.*. The scenario is shared across all accelerators, while guides serve different models per accelerator (Intel XPU serves Qwen/Qwen3-0.6B, HPU serves Qwen/Qwen3-8B), so asserting against the scenario's Qwen/Qwen3-32B produced false health-check failures. BaseSmoketest now auto-detects the served model from the running decode Deployment's modelserver container args (vLLM and SGLang arg forms) and asserts against that in kustomize mode. Standalone/modelservice/fma are unchanged -- there the scenario remains the source of truth. Signed-off-by: Xiaojun Zhang <robin.zhang@intel.com>

github-actions · 2026-06-05T02:38:06Z

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

* feat: modularize shared GKE NCCL tuner patch into Kustomize Component Signed-off-by: Cong Liu <conliu@google.com> * refactor: standardize GKE NCCL tuner patch overlay across all nvidia gpu model servers Signed-off-by: Cong Liu <conliu@google.com> * refactor: standardize optional GKE NCCL tuner overlay to gke/ and update READMEs to INFRA_PROVIDER pattern Signed-off-by: Cong Liu <conliu@google.com> * docs: remove GKE Tuning Patch Component README references from user guides Signed-off-by: Cong Liu <conliu@google.com> * refactor: consolidate lustre overlays into primary GKE connector patches Signed-off-by: Cong Liu <conliu@google.com> * ci: update E2E workflows to reflect new overlay paths Signed-off-by: Cong Liu <conliu@google.com> * ci: align all guide E2E workflows with restructured base and gke overlay paths Signed-off-by: Cong Liu <conliu@google.com> * Apply suggestions from code review Minor edits. Co-authored-by: Abdullah Gharaibeh <40361897+ahg-g@users.noreply.github.com> Signed-off-by: Abdullah Gharaibeh <40361897+ahg-g@users.noreply.github.com> --------- Signed-off-by: Abdullah Gharaibeh <40361897+ahg-g@users.noreply.github.com> Co-authored-by: Abdullah Gharaibeh <40361897+ahg-g@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(smoketest): assert against served model in kustomize mode#1461

fix(smoketest): assert against served model in kustomize mode#1461
xiaojun-zhang wants to merge 1 commit into
llm-d:mainfrom
xiaojun-zhang:fix-kustomize-smoketest-served-model

xiaojun-zhang commented Jun 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

xiaojun-zhang commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xiaojun-zhang commented Jun 5, 2026 •

edited

Loading