Skip to content

Having issues after converting Qwen2.5-Coder #9

@frapell

Description

@frapell

Hey, I am trying to use Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF, as according to Continue documentation it is one of the suggested models for autocomplete. I currently have a Ryzen AI 7 350, so as far as I understand, this should allow me to have autocomplete working fine locally.

I downloaded Q4_0 model (qwen2.5-coder-1.5b-instruct-q4_0.gguf) and followed the documentation instructions (after commenting out the sys.argv hardcoded in code refs #6 ). It looks like everything worked fine:

$ python convert.py -i qwen2.5-coder-1.5b-instruct-q4_0.gguf -o qwen2.5-coder-1.5b
[INFO] Using Qwen35_2B converter
[INFO] Using Qwen35_08B converter
[INFO] Using Qwen35_9B converter
[INFO] Converting ../qwen2.5-coder-1.5b-instruct-q4_0.gguf to qwen2.5-coder-1.5b...
[INFO] Loading Q4NX config from configs/qwen2.json
[INFO] Creating name maps...
[INFO] Detected 28 layers for pattern 'blk.{bid}.ffn_norm.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_q.bias'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_output.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.ffn_gate.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.ffn_down.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_k.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_v.bias'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_k.bias'
[INFO] Detected 28 layers for pattern 'blk.{bid}.ffn_up.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_norm.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_q.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_v.weight'
        Converted token_embd.weight to model.embed_tokens.weight
        Converted blk.0.attn_q.weight to model.layers.0.self_attn.q_proj.weight
        Converted blk.0.attn_q.bias to model.layers.0.self_attn.q_proj.bias
        Converted blk.0.attn_k.weight to model.layers.0.self_attn.k_proj.weight
        Converted blk.0.attn_k.bias to model.layers.0.self_attn.k_proj.bias
        Converted blk.0.attn_v.weight to model.layers.0.self_attn.v_proj.weight
        Converted blk.0.attn_v.bias to model.layers.0.self_attn.v_proj.bias
        Converted blk.0.attn_output.weight to model.layers.0.self_attn.o_proj.weight
        Converted blk.0.ffn_up.weight to model.layers.0.mlp.up_proj.weight
        Converted blk.0.ffn_gate.weight to model.layers.0.mlp.gate_proj.weight
        Converted blk.0.ffn_down.weight to model.layers.0.mlp.down_proj.weight
        Converted blk.0.attn_norm.weight to model.layers.0.input_layernorm.weight
        Converted blk.0.ffn_norm.weight to model.layers.0.post_attention_layernorm.weight
        Converted output_norm.weight to model.norm.weight
        Converted output.weight to lm_head.weight
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
[INFO] Saving Q4NX tensors to qwen2.5-coder-1.5b/model.q4nx...
[INFO] Creating directory qwen2.5-coder-1.5b...
[INFO] Extracting tokenizer JSON...
[INFO] EOS token ID: 151645
[INFO] Padding token ID: 151643
[INFO] BOSS token ID: 151643
[INFO] Tokenizer saved to qwen2.5-coder-1.5b/tokenizer.json
[INFO] Conversion complete! Output saved to qwen2.5-coder-1.5b

I ended up with only 2 files, model.q4nx and tokenizer.json

So, I followed the instructions on how to add it to the list of models. I pulled the closest I found (qwen2.5-it:3b) and so I copied ~/.config/flm/models/Qwen2.5-3B-Instruct-NPU2 as ~/.config/flm/models/Qwen2.5-Coder-1.5b and here goes the first question:

  • I only replaced the files I had, model.q4nx and tokenizer.json. There are other files in there that I didn't touch, not sure if I should change them, delete them, etc? these are chat_template.jinja, config.json and tokenizer_config.json

Lastly, I edited /opt/fastflowlm/share/flm/model_list.json, and added my model right next to qwen2.5-it:

        "qwen2.5-coder": {
            "1.5b": {
                "name": "Qwen2.5-Coder-1.5b",
                "url": "",
                "file_url": "",
                "modified_at": "2026-06-11T00:00:00Z",
                "size": 3000000000,
                "flm_min_version": "0.9.32",
                "files": [
                    "config.json",
                    "model.q4nx",
                    "tokenizer.json",
                    "tokenizer_config.json"
                ],
                "default_context_length": 32768,
                "max_prefill_len": 4096,
                "details": {
                    "family": "qwen2",
                    "think": false,
                    "think_toggleable": false,
                    "parameter_size": "1.5B",
                    "quantization_level": "Q4_0"
                },
                "footprint": 2.5
            }
        },
        "qwen2.5-it": {
            "3b": {
                "name": "Qwen2.5-3B-Instruct-NPU2",
                "url": "https://huggingface.co/FastFlowLM/Qwen2.5-3B-Instruct-NPU2/resolve/main",
                "file_url": "https://huggingface.co/api/models/FastFlowLM/Qwen2.5-3B-Instruct-NPU2/tree/main",
                "modified_at": "2025-05-30T00:00:00Z",
                "size": 3000000000,
                "flm_min_version": "0.9.32",
                "files": [
                    "config.json",
                    "model.q4nx",
                    "tokenizer.json",
                    "tokenizer_config.json"
                ],
                "default_context_length": 32768,
                "max_prefill_len": 4096,
                "details": {
                    "family": "qwen2",
                    "think": true,
                    "think_toggleable": false,
                    "parameter_size": "3B",
                    "quantization_level": "Q4_0"
                },
                "footprint": 2.5
            }
        },

Now, flm list does list the new model, however trying to use it results in an error

$ flm list
Models:
...
...
  - qwen2.5-coder:1.5b ✅
  - qwen2.5-it:3b ✅
...
...

$ flm run qwen2.5-coder:1.5b
[FLM]  Loading model: /home/frapell/.config/flm/models/Qwen2.5-Coder-1.5b
model.embed_tokens.weight size mismatch: 466747392 != 622329856

I have no idea where this model.embed_tokens.weight comes from, or how to change it? I tried looking for it in one of the different files but couldn't find it, so I suspect it is something related to the conversion (I do see Converted token_embd.weight to model.embed_tokens.weight in the conversion logs, but dunno if I need to do something about it?), I have no idea...

Any advice?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions