Hey, I am trying to use Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF, as according to Continue documentation it is one of the suggested models for autocomplete. I currently have a Ryzen AI 7 350, so as far as I understand, this should allow me to have autocomplete working fine locally.
I downloaded Q4_0 model (qwen2.5-coder-1.5b-instruct-q4_0.gguf) and followed the documentation instructions (after commenting out the sys.argv hardcoded in code refs #6 ). It looks like everything worked fine:
$ python convert.py -i qwen2.5-coder-1.5b-instruct-q4_0.gguf -o qwen2.5-coder-1.5b
[INFO] Using Qwen35_2B converter
[INFO] Using Qwen35_08B converter
[INFO] Using Qwen35_9B converter
[INFO] Converting ../qwen2.5-coder-1.5b-instruct-q4_0.gguf to qwen2.5-coder-1.5b...
[INFO] Loading Q4NX config from configs/qwen2.json
[INFO] Creating name maps...
[INFO] Detected 28 layers for pattern 'blk.{bid}.ffn_norm.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_q.bias'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_output.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.ffn_gate.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.ffn_down.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_k.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_v.bias'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_k.bias'
[INFO] Detected 28 layers for pattern 'blk.{bid}.ffn_up.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_norm.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_q.weight'
[INFO] Detected 28 layers for pattern 'blk.{bid}.attn_v.weight'
Converted token_embd.weight to model.embed_tokens.weight
Converted blk.0.attn_q.weight to model.layers.0.self_attn.q_proj.weight
Converted blk.0.attn_q.bias to model.layers.0.self_attn.q_proj.bias
Converted blk.0.attn_k.weight to model.layers.0.self_attn.k_proj.weight
Converted blk.0.attn_k.bias to model.layers.0.self_attn.k_proj.bias
Converted blk.0.attn_v.weight to model.layers.0.self_attn.v_proj.weight
Converted blk.0.attn_v.bias to model.layers.0.self_attn.v_proj.bias
Converted blk.0.attn_output.weight to model.layers.0.self_attn.o_proj.weight
Converted blk.0.ffn_up.weight to model.layers.0.mlp.up_proj.weight
Converted blk.0.ffn_gate.weight to model.layers.0.mlp.gate_proj.weight
Converted blk.0.ffn_down.weight to model.layers.0.mlp.down_proj.weight
Converted blk.0.attn_norm.weight to model.layers.0.input_layernorm.weight
Converted blk.0.ffn_norm.weight to model.layers.0.post_attention_layernorm.weight
Converted output_norm.weight to model.norm.weight
Converted output.weight to lm_head.weight
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_down from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
Padding ffn_up/gate from 8960 to 9216
[INFO] Saving Q4NX tensors to qwen2.5-coder-1.5b/model.q4nx...
[INFO] Creating directory qwen2.5-coder-1.5b...
[INFO] Extracting tokenizer JSON...
[INFO] EOS token ID: 151645
[INFO] Padding token ID: 151643
[INFO] BOSS token ID: 151643
[INFO] Tokenizer saved to qwen2.5-coder-1.5b/tokenizer.json
[INFO] Conversion complete! Output saved to qwen2.5-coder-1.5b
I ended up with only 2 files, model.q4nx and tokenizer.json
So, I followed the instructions on how to add it to the list of models. I pulled the closest I found (qwen2.5-it:3b) and so I copied ~/.config/flm/models/Qwen2.5-3B-Instruct-NPU2 as ~/.config/flm/models/Qwen2.5-Coder-1.5b and here goes the first question:
- I only replaced the files I had,
model.q4nx and tokenizer.json. There are other files in there that I didn't touch, not sure if I should change them, delete them, etc? these are chat_template.jinja, config.json and tokenizer_config.json
Lastly, I edited /opt/fastflowlm/share/flm/model_list.json, and added my model right next to qwen2.5-it:
"qwen2.5-coder": {
"1.5b": {
"name": "Qwen2.5-Coder-1.5b",
"url": "",
"file_url": "",
"modified_at": "2026-06-11T00:00:00Z",
"size": 3000000000,
"flm_min_version": "0.9.32",
"files": [
"config.json",
"model.q4nx",
"tokenizer.json",
"tokenizer_config.json"
],
"default_context_length": 32768,
"max_prefill_len": 4096,
"details": {
"family": "qwen2",
"think": false,
"think_toggleable": false,
"parameter_size": "1.5B",
"quantization_level": "Q4_0"
},
"footprint": 2.5
}
},
"qwen2.5-it": {
"3b": {
"name": "Qwen2.5-3B-Instruct-NPU2",
"url": "https://huggingface.co/FastFlowLM/Qwen2.5-3B-Instruct-NPU2/resolve/main",
"file_url": "https://huggingface.co/api/models/FastFlowLM/Qwen2.5-3B-Instruct-NPU2/tree/main",
"modified_at": "2025-05-30T00:00:00Z",
"size": 3000000000,
"flm_min_version": "0.9.32",
"files": [
"config.json",
"model.q4nx",
"tokenizer.json",
"tokenizer_config.json"
],
"default_context_length": 32768,
"max_prefill_len": 4096,
"details": {
"family": "qwen2",
"think": true,
"think_toggleable": false,
"parameter_size": "3B",
"quantization_level": "Q4_0"
},
"footprint": 2.5
}
},
Now, flm list does list the new model, however trying to use it results in an error
$ flm list
Models:
...
...
- qwen2.5-coder:1.5b ✅
- qwen2.5-it:3b ✅
...
...
$ flm run qwen2.5-coder:1.5b
[FLM] Loading model: /home/frapell/.config/flm/models/Qwen2.5-Coder-1.5b
model.embed_tokens.weight size mismatch: 466747392 != 622329856
I have no idea where this model.embed_tokens.weight comes from, or how to change it? I tried looking for it in one of the different files but couldn't find it, so I suspect it is something related to the conversion (I do see Converted token_embd.weight to model.embed_tokens.weight in the conversion logs, but dunno if I need to do something about it?), I have no idea...
Any advice?
Hey, I am trying to use Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF, as according to Continue documentation it is one of the suggested models for autocomplete. I currently have a Ryzen AI 7 350, so as far as I understand, this should allow me to have autocomplete working fine locally.
I downloaded Q4_0 model (
qwen2.5-coder-1.5b-instruct-q4_0.gguf) and followed the documentation instructions (after commenting out thesys.argvhardcoded in code refs #6 ). It looks like everything worked fine:I ended up with only 2 files,
model.q4nxandtokenizer.jsonSo, I followed the instructions on how to add it to the list of models. I pulled the closest I found (
qwen2.5-it:3b) and so I copied~/.config/flm/models/Qwen2.5-3B-Instruct-NPU2as~/.config/flm/models/Qwen2.5-Coder-1.5band here goes the first question:model.q4nxandtokenizer.json. There are other files in there that I didn't touch, not sure if I should change them, delete them, etc? these arechat_template.jinja,config.jsonandtokenizer_config.jsonLastly, I edited
/opt/fastflowlm/share/flm/model_list.json, and added my model right next toqwen2.5-it:Now,
flm listdoes list the new model, however trying to use it results in an errorI have no idea where this
model.embed_tokens.weightcomes from, or how to change it? I tried looking for it in one of the different files but couldn't find it, so I suspect it is something related to the conversion (I do seeConverted token_embd.weight to model.embed_tokens.weightin the conversion logs, but dunno if I need to do something about it?), I have no idea...Any advice?