Skip to content

[recipe] feat: add TRTLLM FP8-E2E recipe for Qwen3-30B-A3B DAPO on GB200#112

Open
Superjomn wants to merge 1 commit into
verl-project:mainfrom
Superjomn:chunweiy/low-precision-trtllm-fp8e2e
Open

[recipe] feat: add TRTLLM FP8-E2E recipe for Qwen3-30B-A3B DAPO on GB200#112
Superjomn wants to merge 1 commit into
verl-project:mainfrom
Superjomn:chunweiy/low-precision-trtllm-fp8e2e

Conversation

@Superjomn

@Superjomn Superjomn commented Jun 16, 2026

Copy link
Copy Markdown
image The result should roughly align with the [vllm recipe's experiment](https://verl.readthedocs.io/en/latest/low_precision/fp8.html#experiments-and-results):

@Superjomn Superjomn marked this pull request as draft June 16, 2026 04:46

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new end-to-end FP8 training recipe for Qwen3-30B-A3B using Megatron and TRT-LLM, along with updated documentation in the README. Feedback on the new shell script includes wrapping the logger list in double quotes to prevent bash globbing issues and addressing an unused variable (rollout_token_veto_threshold) that is defined but not mapped to the algorithm configuration.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread low_precision/run_dapo_qwen3_moe_30b_megatron_trtllm_fp8e2e.sh Outdated
Comment thread low_precision/run_dapo_qwen3_moe_30b_megatron_trtllm_fp8e2e.sh Outdated
@Superjomn Superjomn changed the title [recipe] feat: add TRT-LLM FP8-E2E recipe for Qwen3-30B-A3B DAPO on B… [recipe] feat: add TRTLLM FP8-E2E recipe for Qwen3-30B-A3B DAPO on GB200 Jun 16, 2026
…lackwell

Signed-off-by: Chunwei Yan <chunweiy@nvidia.com>
@Superjomn Superjomn force-pushed the chunweiy/low-precision-trtllm-fp8e2e branch from 6a8c0b1 to 1022973 Compare June 16, 2026 06:15
@Superjomn Superjomn marked this pull request as ready for review June 22, 2026 03:23
@sophiayyya

sophiayyya commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Hi @Superjomn It seems that the entropy is higher than entropy in https://verl.readthedocs.io/en/latest/low_precision/fp8.html#qwen3-30b-a3b-moe-model. Have you applied TIS? And it would be better to have a comparison between fp8 and bf16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants