Having issues after converting Qwen2.5-Coder

Hey, I am trying to use [Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF), as according to [Continue documentation](https://docs.continue.dev/customize/deep-dives/autocomplete) it is one of the suggested models for autocomplete. I currently have a Ryzen AI 7 350, so as far as I understand, this should allow me to have autocomplete working fine locally.

I downloaded Q4_0 model (`qwen2.5-coder-1.5b-instruct-q4_0.gguf`) and followed the documentation instructions (after commenting out the `sys.argv` hardcoded in code refs #6 ). It looks like everything worked fine:
```bash
$ python convert.py -i qwen2.5-coder-1.5b-instruct-q4_0.gguf -o qwen2.5-coder-1.5b
[INFO] Using Qwen35_2B converter
[INFO] Using Qwen35_08B converter
[INFO] Using Qwen35_9B converter
[INFO] Converting ../qwen2.5-coder-1.5b-instruct-q4_0.gguf to qwen2.5-coder-1.5b...
[INFO] Loading Q4NX config from configs/qwen2.json
[INFO] Creating name maps...
[INFO] Detected 28 layers for pattern 'blk.{bid}.ffn_norm.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_q.bias'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_output.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.ffn_gate.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.ffn_down.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_k.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_v.bias'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_k.bias'
[INFO] Detected 28 layers for pattern 'blk.{bid}.ffn_up.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_norm.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_q.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_v.weight'
        Converted token_embd.weight to model.embed_tokens.weight
        Converted blk.0.attn_q.weight to model.layers.0.self_attn.q_proj.weight
        Converted blk.0.attn_q.bias to model.layers.0.self_attn.q_proj.bias
        Converted blk.0.attn_k.weight to model.layers.0.self_attn.k_proj.weight
        Converted blk.0.attn_k.bias to model.layers.0.self_attn.k_proj.bias
        Converted blk.0.attn_v.weight to model.layers.0.self_attn.v_proj.weight
        Converted blk.0.attn_v.bias to model.layers.0.self_attn.v_proj.bias
        Converted blk.0.attn_output.weight to model.layers.0.self_attn.o_proj.weight
        Converted blk.0.ffn_up.weight to model.layers.0.mlp.up_proj.weight
        Converted blk.0.ffn_gate.weight to model.layers.0.mlp.gate_proj.weight
        Converted blk.0.ffn_down.weight to model.layers.0.mlp.down_proj.weight
        Converted blk.0.attn_norm.weight to model.layers.0.input_layernorm.weight
        Converted blk.0.ffn_norm.weight to model.layers.0.post_attention_layernorm.weight
        Converted output_norm.weight to model.norm.weight
        Converted output.weight to lm_head.weight
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
[INFO] Saving Q4NX tensors to qwen2.5-coder-1.5b/model.q4nx...
[INFO] Creating directory qwen2.5-coder-1.5b...
[INFO] Extracting tokenizer JSON...
[INFO] EOS token ID: 151645
[INFO] Padding token ID: 151643
[INFO] BOSS token ID: 151643
[INFO] Tokenizer saved to qwen2.5-coder-1.5b/tokenizer.json
[INFO] Conversion complete! Output saved to qwen2.5-coder-1.5b
```

I ended up with only 2 files, `model.q4nx` and `tokenizer.json`

So, I followed the instructions on how to add it to the list of models. I pulled the closest I found (`qwen2.5-it:3b`) and so I copied `~/.config/flm/models/Qwen2.5-3B-Instruct-NPU2` as `~/.config/flm/models/Qwen2.5-Coder-1.5b` and here goes the first question:

- I only replaced the files I had, `model.q4nx` and `tokenizer.json`. There are other files in there that I didn't touch, not sure if I should change them, delete them, etc? these are `chat_template.jinja`, `config.json` and `tokenizer_config.json`

Lastly, I edited `/opt/fastflowlm/share/flm/model_list.json`, and added my model right next to `qwen2.5-it`:
```json

        "qwen2.5-coder": {
            "1.5b": {
                "name": "Qwen2.5-Coder-1.5b",
                "url": "",
                "file_url": "",
                "modified_at": "2026-06-11T00:00:00Z",
                "size": 3000000000,
                "flm_min_version": "0.9.32",
                "files": [
                    "config.json",
                    "model.q4nx",
                    "tokenizer.json",
                    "tokenizer_config.json"
                ],
                "default_context_length": 32768,
                "max_prefill_len": 4096,
                "details": {
                    "family": "qwen2",
                    "think": false,
                    "think_toggleable": false,
                    "parameter_size": "1.5B",
                    "quantization_level": "Q4_0"
                },
                "footprint": 2.5
            }
        },
        "qwen2.5-it": {
            "3b": {
                "name": "Qwen2.5-3B-Instruct-NPU2",
                "url": "https://huggingface.co/FastFlowLM/Qwen2.5-3B-Instruct-NPU2/resolve/main",
                "file_url": "https://huggingface.co/api/models/FastFlowLM/Qwen2.5-3B-Instruct-NPU2/tree/main",
                "modified_at": "2025-05-30T00:00:00Z",
                "size": 3000000000,
                "flm_min_version": "0.9.32",
                "files": [
                    "config.json",
                    "model.q4nx",
                    "tokenizer.json",
                    "tokenizer_config.json"
                ],
                "default_context_length": 32768,
                "max_prefill_len": 4096,
                "details": {
                    "family": "qwen2",
                    "think": true,
                    "think_toggleable": false,
                    "parameter_size": "3B",
                    "quantization_level": "Q4_0"
                },
                "footprint": 2.5
            }
        },
```

Now, `flm list` does list the new model, however trying to use it results in an error
```
$ flm list
Models:
...
...
  - qwen2.5-coder:1.5b ✅
  - qwen2.5-it:3b ✅
...
...

$ flm run qwen2.5-coder:1.5b
[FLM]  Loading model: /home/frapell/.config/flm/models/Qwen2.5-Coder-1.5b
model.embed_tokens.weight size mismatch: 466747392 != 622329856
```

I have no idea where this `model.embed_tokens.weight` comes from, or how to change it? I tried looking for it in one of the different files but couldn't find it, so I suspect it is something related to the conversion (I do see `Converted token_embd.weight to model.embed_tokens.weight` in the conversion logs, but dunno if I need to do something about it?), I have no idea...

Any advice?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Having issues after converting Qwen2.5-Coder #9

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Having issues after converting Qwen2.5-Coder #9

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions