Skip to content

能否拨冗查看下这个问题TypeError: output tensor must have the same type as input tensor #308

Description

@ZephryLiang

您好,非常不好意思打扰您,能否抽空帮忙看下这个问题
问题运行train.sh脚本报错:
尝试打印dtype

def training_step( 加入打印代码 
/transformers/trainer.py", line 2548
Input tensor input_ids dtype: torch.int64
Input tensor attention_mask dtype: torch.int64
Input tensor labels dtype: torch.int64
Loss tensor dtype: torch.float32

TypeError:output tensor must have the same type as input tensor
包介绍:
torch 2.6.0 python 3.11.11
显卡:4*nvidia 4090
脚本:
--quantization_bit 4 \ zeros去掉 因为报错
deepspeed --num_gpus 4 dbgpt_hub_sql/train/sft_train.py
--deepspeed dbgpt_hub_sql/configs/ds_config_stage3.json
--model_name_or_path codellama/CodeLlama-7B-Instruct-hf
--do_train
--dataset example_text2sql_train
--max_source_length 1024
--max_target_length 512
--template llama2
--finetuning_type lora
--lora_rank 64
--lora_alpha 32
--lora_target q_proj,v_proj
--output_dir dbgpt_hub_sql/output/adapter/llama2-13b-qlora_1024_epoch1_debug1008_withDeepseed_mulitCard
--overwrite_cache
--overwrite_output_dir
--per_device_train_batch_size 1
--gradient_accumulation_steps 16
--lr_scheduler_type cosine_with_restarts
--logging_steps 25
--save_steps 20
--learning_rate 2e-4
--num_train_epochs 0.1
--plot_loss
--bf16 2>&1 | tee ${train_log}
报错代码:

File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/transformers/trainer.py", line 2241, in train
return **inner_training_loop**(----->

File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/transformers/trainer.py", line 2548, in _inner_training_loop
----> tr_loss_step = **self.training_step**(model, inputs, num_items_in_batch)

 File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/transformers/trainer.py", line 3745, in training_step
 **self.accelerator.backward**(loss, **kwargs)
File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/accelerate/accelerator.py", line 2321, in backward
**self.deepspeed_engine_wrapped.backward**(loss, **kwargs)
 File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/accelerate/utils/deepspeed.py", line 275, in backward
**self.engine.step()**
 File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 2249, in step
 **self._take_model_step(lr_kwargs)**
 
**中间省略**

[rank0]:   File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1530, in _all_gather
[rank0]:     **self._allgather_params_coalesced(all_gather_nonquantize_list, hierarchy, quantize=False)**
[rank0]:   File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1840, in _allgather_params_coalesced
[rank0]:     h = dist.all_gather_into_tensor(allgather_params[param_idx],

[rank0]:   File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/deepspeed/comm/comm.py", line 117, in log_wrapper
[rank0]:     return func(*args, **kwargs)

[rank0]:   File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/deepspeed/comm/comm.py", line 305, in all_gather_into_tensor
[rank0]:     return **cdb.all_gather_into_tensor(output_tensor=output_tensor, input_tensor=tensor, group=group, async_op=async_op)**

[rank0]:   File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/deepspeed/comm/torch.py", line 220, in all_gather_into_tensor
[rank0]:     return **self.all_gather_function(output_tensor=output_tensor,**

[rank0]:   File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/torch/distributed/c10d_logger.py", line 81, in wrapper
[rank0]:     **return func(*args, **kwargs)**

[rank0]:   File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 3798, in all_gather_into_tensor
[rank0]:     work = **group._allgather_base(output_tensor, input_tensor, opts)**

[rank0]: **TypeError: output tensor must have the same type as input tensor**

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions