Skip to content

add sequence_parallel_size and error. #63

@guotong1988

Description

@guotong1988

Reminder

  • I have read the README and searched the existing issues.

System Info

Traceback (most recent call last):
  File "/home/ai/anaconda3/bin/llamafactory-cli", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/ai/360-LLaMA-Factory/src/llamafactory/cli.py", line 112, in main
    run_exp()
  File "/home/ai/360-LLaMA-Factory/src/llamafactory/train/tuner.py", line 56, in run_exp
    run_dpo(model_args, data_args, training_args, finetuning_args, callbacks)
  File "/home/ai/360-LLaMA-Factory/src/llamafactory/train/dpo/workflow.py", line 47, in run_dpo
    model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train, full_determinism=training_args.full_determinism)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ai/360-LLaMA-Factory/src/llamafactory/model/loader.py", line 147, in load_model
    sequence_parallel_group = apply_sequence_parallel(model_args, full_determinism)  # monkey patching, similar to liger_kernel
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ai/360-LLaMA-Factory/src/llamafactory/model/model_utils/sequence_parallel.py", line 62, in apply_sequence_parallel
    group_this = init_sp_group(model_args.sequence_parallel_size)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ai/360-LLaMA-Factory/src/llamafactory/model/model_utils/sequence_parallel.py", line 43, in init_sp_group
    assert dist.is_initialized()
           ^^^^^^^^^^^^^^^^^^^^^
AssertionError

Reproduction

yaml:

stage: dpo
do_train: true
finetuning_type: lora
sequence_parallel_size: 4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions