feat(retool): add RL training script for Qwen3.5 on NPU (Ascend). by luoshijiang · Pull Request #106 · verl-project/verl-recipe

luoshijiang · 2026-05-28T11:04:23Z

Summary

An example script and usage for ReTool DAPO RL training of the Qwen3.5 model on NPU (Ascend).

Usage

Environment Requirements

Ensure the environment meets the requirements for Qwen3.5 inference and reinforcement learning (RL) training.

Reference versions:

verl : 0.8.0.dev0
vllm : 0.18.0+empty
vllm_ascend : 0.17.0rc2.dev109+g54879467c
transformers : 5.3.0.dev0
torch : 2.9.0+cpu
torch_npu : 2.9.0
ray : 2.48.0

Reference official Docker image:
quay.io/ascend/verl:verl-8.5.2-910b-ubuntu22.04-py3.11-qwen3-5

Clone verl-recipe

git clone https://github.com/verl-project/verl-recipe.git

Deploy Required Tools

For example, to set up Sandbox, refer to: https://github.com/bytedance/SandboxFusion.git

Data Preparation

Preprocess SFT dataset (confirm or modify the save path)

python3 verl-recipe/retool/retool_sft_preprocess.py

Download RL and Val datasets (e.g. DAPO-Math-17k, AIME_2024)

hf download BytedTsinghua-SIA/DAPO-Math-17k --local-dir ./dataset/BytedTsinghua-SIA/DAPO-Math-17k --repo-type dataset
hf download Maxwell-Jia/AIME_2024 --local-dir ./dataset/Maxwell-Jia/AIME_2024 --repo-type dataset

Training

SFT

bash verl-recipe/retool/run_qwen35-9b_sft_npu.sh

Convert weights

python3 -m verl.model_merger merge --backend fsdp --local_dir /checkpoint/multiturn-sft-qwen-3.5-9b/global_step_372 --target_dir /checkpoint/multiturn-sft-qwen-3.5-9b/global_step_372/huggingface

RL

bash verl-recipe/retool/run_qwen35-9b_dapo_npu.sh

…U (Ascend).

gemini-code-assist

Code Review

This pull request introduces a new bash script retool/run_qwen35-9b_dapo_npu.sh to configure and run PPO training for Qwen 3.5 9B on NPU devices. The reviewer provided several constructive suggestions to improve the script's robustness, including using set -e to exit immediately on errors, dynamically resolving the script's directory (SCRIPT_DIR) to make local path references more robust, and quoting variable expansions to prevent word splitting.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

feat(retool): add retool DAPO training script for Qwen3.5 model on NP…

c94033f

…U (Ascend).

gemini-code-assist Bot reviewed May 28, 2026

View reviewed changes

luoshijiang and others added 7 commits May 30, 2026 16:16

Update retool/run_qwen35-9b_dapo_npu.sh

111dfe3

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update retool/run_qwen35-9b_dapo_npu.sh

d310d7a

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update retool/run_qwen35-9b_dapo_npu.sh

b9fb13f

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update retool/run_qwen35-9b_dapo_npu.sh

2feb24e

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update retool/run_qwen35-9b_dapo_npu.sh

88d673d

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update retool/run_qwen35-9b_dapo_npu.sh

7b71761

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update retool/run_qwen35-9b_dapo_npu.sh

a0d0d92

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(retool): add RL training script for Qwen3.5 on NPU (Ascend).#106

feat(retool): add RL training script for Qwen3.5 on NPU (Ascend).#106
luoshijiang wants to merge 8 commits into
verl-project:mainfrom
luoshijiang:add-qwen35-retool-dapo-npu-script

luoshijiang commented May 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

luoshijiang commented May 28, 2026

Summary

Usage

Environment Requirements

Clone verl-recipe

Deploy Required Tools

Data Preparation

Preprocess SFT dataset (confirm or modify the save path)

Download RL and Val datasets (e.g. DAPO-Math-17k, AIME_2024)

Training

SFT

Convert weights

RL

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant