Skip to content

Add EPP scale limits experiment and profiles#1468

Open
madhugoutham wants to merge 5 commits into
llm-d:mainfrom
madhugoutham:benchmarks/epp-scale-limits
Open

Add EPP scale limits experiment and profiles#1468
madhugoutham wants to merge 5 commits into
llm-d:mainfrom
madhugoutham:benchmarks/epp-scale-limits

Conversation

@madhugoutham

Copy link
Copy Markdown
Contributor

Add experiment definition and inference-perf profiles for evaluating EPP throughput across replica counts, CPU allocations, and routing strategies. Includes standalone script for EPP scale testing.

Addresses llm-d/llm-d-router#1290.

  Add experiment definition and inference-perf profile for evaluating EPP
  throughput across replica counts (1-4), CPU allocations (1/4/8 cores),
  and routing strategies (default, optimized-baseline, active-request).
  14 setup treatments with QPS ramp from 200 to 5000.

  Validated with dry-run: 14/14 treatments succeeded.
  Addresses llm-d/llm-d-router#1290.

Signed-off-by: Madhu Goutham Reddy Ambati <mambati@redhat.com>
  Add run_epp_scale.sh for standalone EPP scale testing and three
  inference-perf profiles (baseline, stress, GPU ramp).

Signed-off-by: Madhu Goutham Reddy Ambati <mambati@redhat.com>
  Add run_epp_scale.sh for standalone EPP scale testing and three
  inference-perf profiles (baseline, stress, GPU ramp).

Signed-off-by: Madhu Goutham Reddy Ambati <mambati@redhat.com>
streaming: true
server:
type: vllm
model_name: meta-llama/Llama-3.1-8B-Instruct

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model_name: REPLACE_ENV_LLMDBENCH_DEPLOY_CURRENT_MODEL

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see... these are rendered files.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added .yaml.in templates for the baseline and GPU profiles to match repo convention, removed the rendered .yaml files, and wired template rendering into the standalone script.

Comment thread workload/profiles/inference-perf/epp_scale_gpu.yaml Outdated
    Convert baseline and GPU profiles to .yaml.in with REPLACE_ENV_*
    placeholders. Remove rendered .yaml files. Add sed rendering to
    run_epp_scale.sh so --model flag flows into the config.

Signed-off-by: Madhu Goutham Reddy Ambati <mambati@redhat.com>
  Add session-affinity-filter as a routing strategy option. Use --set-file
  instead of --set for plugins config to avoid multiline YAML mangling.
  Make EPP image repo configurable via EPP_IMAGE_REPO env var.

Signed-off-by: Madhu Goutham Reddy Ambati <mambati@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants