Skip to content

feat(retool): add RL training script for Qwen3.5 on NPU (Ascend).#106

Open
luoshijiang wants to merge 8 commits into
verl-project:mainfrom
luoshijiang:add-qwen35-retool-dapo-npu-script
Open

feat(retool): add RL training script for Qwen3.5 on NPU (Ascend).#106
luoshijiang wants to merge 8 commits into
verl-project:mainfrom
luoshijiang:add-qwen35-retool-dapo-npu-script

Conversation

@luoshijiang

Copy link
Copy Markdown

Summary

An example script and usage for ReTool DAPO RL training of the Qwen3.5 model on NPU (Ascend).

Usage

Environment Requirements

Ensure the environment meets the requirements for Qwen3.5 inference and reinforcement learning (RL) training.

Reference versions:

  • verl : 0.8.0.dev0
  • vllm : 0.18.0+empty
  • vllm_ascend : 0.17.0rc2.dev109+g54879467c
  • transformers : 5.3.0.dev0
  • torch : 2.9.0+cpu
  • torch_npu : 2.9.0
  • ray : 2.48.0

Reference official Docker image:
quay.io/ascend/verl:verl-8.5.2-910b-ubuntu22.04-py3.11-qwen3-5

Clone verl-recipe

git clone https://github.com/verl-project/verl-recipe.git

Deploy Required Tools

For example, to set up Sandbox, refer to: https://github.com/bytedance/SandboxFusion.git

Data Preparation

Preprocess SFT dataset (confirm or modify the save path)

python3 verl-recipe/retool/retool_sft_preprocess.py

Download RL and Val datasets (e.g. DAPO-Math-17k, AIME_2024)

hf download BytedTsinghua-SIA/DAPO-Math-17k --local-dir ./dataset/BytedTsinghua-SIA/DAPO-Math-17k --repo-type dataset
hf download Maxwell-Jia/AIME_2024 --local-dir ./dataset/Maxwell-Jia/AIME_2024 --repo-type dataset

Training

SFT

bash verl-recipe/retool/run_qwen35-9b_sft_npu.sh

Convert weights

python3 -m verl.model_merger merge --backend fsdp --local_dir /checkpoint/multiturn-sft-qwen-3.5-9b/global_step_372 --target_dir /checkpoint/multiturn-sft-qwen-3.5-9b/global_step_372/huggingface

RL

bash verl-recipe/retool/run_qwen35-9b_dapo_npu.sh

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new bash script retool/run_qwen35-9b_dapo_npu.sh to configure and run PPO training for Qwen 3.5 9B on NPU devices. The reviewer provided several constructive suggestions to improve the script's robustness, including using set -e to exit immediately on errors, dynamically resolving the script's directory (SCRIPT_DIR) to make local path references more robust, and quoting variable expansions to prevent word splitting.

Comment thread retool/run_qwen35-9b_dapo_npu.sh Outdated
Comment thread retool/run_qwen35-9b_dapo_npu.sh
Comment thread retool/run_qwen35-9b_dapo_npu.sh Outdated
Comment thread retool/run_qwen35-9b_dapo_npu.sh Outdated
Comment thread retool/run_qwen35-9b_dapo_npu.sh Outdated
Comment thread retool/run_qwen35-9b_dapo_npu.sh Outdated
Comment thread retool/run_qwen35-9b_dapo_npu.sh Outdated
luoshijiang and others added 7 commits May 30, 2026 16:16
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant