Skip to content

开启zigzag-ring,训练速度很慢 #68

@xuchanganan

Description

@xuchanganan

Reminder

  • I have read the README and searched the existing issues.

System Info

使用zigzag-ring+qwen3,sp=8,数据量 2700,开sp后,数据集大小变成了 2700 * 8,速度好像也慢了8倍多,本来2小时可以训练完的,现在得20多小时,有没有办法提速,看readme好像qwen3也没法尝试ulysses?

Reproduction

deepspeed --master_port=48765
src/train.py
--stage dpo
--do_train
--model_name_or_path "LLMs/Qwen3-32B"
--dataset dpo_zh_demo
--template qwen3
--finetuning_type full
--pref_beta 0.1
--pref_loss sigmoid
--output_dir output/debug
--cache_dir .cache
--overwrite_cache
--overwrite_output_dir
--cutoff_len 32768
--per_device_train_batch_size 1
--gradient_accumulation_steps 1
--lr_scheduler_type cosine
--warmup_ratio 0.0
--logging_steps 1
--save_steps 2000
--save_strategy steps
--learning_rate 1e-6
--num_train_epochs 3
--plot_loss
--save_only_model True
--deepspeed examples/deepspeed/ds_z3_offload_config.json
--flash_attn fa2
--gradient_checkpointing True
--bf16 True
--ddp_timeout 180000000
--seed 42
--sequence_parallel_size 8

Expected behavior

No response

Others

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions