Add minimal support for llm-d manifest generation by jgchn · Pull Request #274 · llm-d-incubation/llm-d-planner

jgchn · 2026-06-24T01:36:21Z

Adds the ability to generate llm-d stack deployment configs as an alternative to standalone vLLM (KServe InferenceService) configs.

When users select "llm-d" as the deployment stack, the planner generates:

Kustomize overlay (kustomization.yaml + patch-vllm.yaml) for model server deployment, referencing the llm-d base manifests at guides/recipes/modelserver/base/single-host/default/
Helm values (values.yaml) for the EPP (Endpoint Picker Pod) + InferencePool, using the standalone chart from oci://registry.k8s.io/gateway-api-inference-extension/charts/standalone

The default router config uses the standard EPP plugins: prefix-cache-scorer, decode-filter, max-score-picker, single-profile-handler.

Key design decisions:

Not a new DeploymentMode — vLLM vs llm-d is an output format choice, orthogonal to simulator/production mode
Separate generator class (LlmdDeploymentGenerator) since the output shape is structurally different from KServe InferenceService
No routing intelligence yet — uses default EPP config; routing profile selection is future work
Choice at generation time via ?stack=llm-d query param on /api/v1/deploy

How Has This Been Tested?

17 new unit tests covering generator output structure, kustomization content, patch content, helm values, API endpoint routing, and YAML injection prevention
Full unit test suite passes (361 tests)
Manual UI testing: toggling between vLLM and llm-d stacks correctly generates and displays the appropriate files
Verified model_id input validation prevents YAML injection attacks

cd src && uv run pytest ../tests/unit/test_llmd_generator.py -v  # 17 pass
cd src && uv run pytest ../tests/unit/ -v                        # 361 pass

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

Remove old disaggregated templates (scheduler.values, patch-decode, patch-prefill, objectives, deploy.sh) and replace with unified kustomization + patch + helm values that generate a single deployment topology with decode-only mode. New templates: - kustomization.yaml.j2: references llm-d base recipe, applies namePrefix and labels - patch-vllm.yaml.j2: patches decode deployment with model, replicas, tensor_parallel, and GPU resources - values.yaml.j2: helm values for llm-d-inference-scheduler with inferencePool selector Assisted-by: Claude <noreply@anthropic.com> Signed-off-by: Jing Chen <jing.chen2@ibm.com>

Simplified LlmdDeploymentGenerator to render the three new llm-d templates: - kustomization.yaml.j2 (kustomize base reference + patches) - patch-vllm.yaml.j2 (vLLM deployment with model/GPU/replica config) - values.yaml.j2 (EPP + InferencePool Helm values) Removed complex routing topology logic and BLIS-specific features. Now focused on core model serving parameters: model_id, gpu_count, tensor_parallel, replicas. Tests rewritten to validate: - 3 output files (kustomization, patch_vllm, helm_values) - Valid YAML rendering for all files - Template variable substitution (model_id, tensor_parallel, replicas, etc.) - EPP config structure in Helm values Assisted-by: Claude <noreply@anthropic.com> Signed-off-by: Jing Chen <jing.chen2@ibm.com>

Assisted-by: Claude <noreply@anthropic.com> Signed-off-by: Jing Chen <jing.chen2@ibm.com>

Add regex validation for model_id in LlmdDeploymentGenerator._prepare_context() to prevent YAML injection attacks. Change stack parameter in /api/v1/deploy endpoint from str to Literal["vllm", "llm-d"] for automatic validation. Add test_invalid_model_id_raises to verify ValueError is raised for malicious model_id formats. Assisted-by: Claude <noreply@anthropic.com> Signed-off-by: Jing Chen <jing.chen2@ibm.com>

Add radio button in deployment tab to choose between vLLM (standalone) and llm-d (router + pool) deployment stacks. The selected stack is passed as a query parameter to the backend API and determines which YAML files are generated and displayed. Changes: - Add stack radio selector in deployment.py before YAML generation - Update deploy_and_generate_yaml() in api_client.py to accept stack param - Pass stack from session state when selecting recommendations - Display correct YAML file labels based on selected stack Assisted-by: Claude <noreply@anthropic.com> Signed-off-by: Jing Chen <jing.chen2@ibm.com>

When the user switches between vLLM and llm-d, clear the previously generated YAML so it regenerates with the correct stack. Without this, switching to llm-d after selecting a recommendation would show no files because the stored YAML had vllm keys but the display expected llm-d keys. Signed-off-by: Jing Chen <jing.chen2@ibm.com>

amito

Hi Jing,

Thank you for this important work.
Please see my comments.

Amit

amito · 2026-06-24T11:08:28Z

-                detail=f"Generated YAML validation failed: {str(e)}",
-            ) from e
+        if stack == "llm-d":
+            result = llmd_generator.generate_all(


YAML validation is skipped for llm-d.

amito · 2026-06-24T19:58:49Z

+  image:
+    registry: ghcr.io
+    repository: llm-d/llm-d-inference-scheduler
+    tag: v0.8.0


Will we pin this version per release of llm-d-planner?

Good point. Now I'm just using the latest available for the core components. We probably want to store the pinned versions we're generating somewhere.

amito · 2026-06-24T20:00:36Z

+    )
+
+
+class TestLlmdGeneratorOutput:


Tests are missing marks (@pytest.mark.unit etc.).

amito

Hi Jing,
Great work, thanks for addressing my comments.
Please address these few minor comments - mostly around testing.
Thanks,
Amit

amito · 2026-06-28T07:31:50Z

+                gpu_count=2,
+                tensor_parallel=2,


Setting different values for gpu_count and tensor_parallel here would help catch regressions.

amito · 2026-06-28T07:32:35Z

          resources:
            requests:
-              nvidia.com/gpu: "{{ gpu_count }}"
+              nvidia.com/gpu: "{{ gpus_per_replica }}"


Do we still need to populate gpu_count now that it's not used in the templates?

amito · 2026-06-28T07:38:00Z

    """Create a test client with mocked app state (no DB required)."""
    app = FastAPI()
-    # Mock app state without requiring DB connection
    app.state.deployment_generator = DeploymentGenerator(simulator_mode=False)


This still creates a bunch of stuff which is not mocked and communicates with real entities on the disk, e.g., L#54 in src/planner/configuration/generator.py does self._catalog = ModelCatalog() and some others create output dirs, etc.
Maybe this whole thing can be mocked?

amito · 2026-06-28T07:43:44Z

+kind: Kustomization
+
+resources:
+  # TODO: pin to release tag (e.g. ?ref=v0.1.0) per llm-d-planner release


We need to consider automating this in the release workflow.

Agreed. As a follow-up item, when we cut a release, the CI/release workflow should inject a pinned ref tag for both the kustomize base resource and the EPP image tag in values.yaml.j2. For now it tracks latest.

amito

Hi Jing,
Thanks for addressing all of my comments, this looks great.
Please squash the commits (either to one or into multiple by topic - feature / tests ... with a short commit message) when merging.
Thanks :)
Amit

amito · 2026-06-30T06:25:13Z

I see that only some commits have verified signatures, better squash locally and sign the new set of commits.

Signed-off-by: Jing Chen <jing.chen2@ibm.com>

jgchn added 6 commits June 23, 2026 14:27

feat: wire stack=llm-d param into /api/v1/deploy endpoint

c578531

Assisted-by: Claude <noreply@anthropic.com> Signed-off-by: Jing Chen <jing.chen2@ibm.com>

jgchn requested review from amito and anfredette and removed request for amito June 24, 2026 01:45

amito requested changes Jun 24, 2026

View reviewed changes

jgchn requested a review from amito June 26, 2026 13:36

amito approved these changes Jun 28, 2026

View reviewed changes

amito reviewed Jun 28, 2026

View reviewed changes

amito approved these changes Jun 30, 2026

View reviewed changes

feat: add minimal support for llm-d manifest generation

e550453

Signed-off-by: Jing Chen <jing.chen2@ibm.com>

jgchn force-pushed the feat/llm-d-config-generation branch from 0a4c969 to e550453 Compare June 30, 2026 13:21

		)


		class TestLlmdGeneratorOutput:

Uh oh!

Conversation

jgchn commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How Has This Been Tested?

Uh oh!

amito left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

amito left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amito left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amito commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jgchn commented Jun 24, 2026 •

edited

Loading

amito left a comment •

edited

Loading