This directory contains the conversion flow for the local HuggingFace model:
../models/gemma-3-270m-it
Gemma-3-270m-it is a text-only LLM, so the flow is simpler than Qwen3-VL.AXERA: there is no vision encoder export. The conversion produces LLM axmodels plus the files required by ax-llm.
From the repository root:
cd npu-codebase
uv sync --extra cpu
source script/npu_dev
build_cmodel -b ax650npu
cd ..The scripts also try to source npu-codebase/script/npu_dev automatically when pulsar2 is not already in PATH.
Default build:
cd gemma-3-270m.axera/model_convert
bash build_all.shDefaults:
INPUT_DIR=${REPO_ROOT}/../models/gemma-3-270m-it
PREFILL_STEP_SIZE=128
PREFILL_LEN=1024
MAX_CONTEXT=2048
CHIP=AX650
PARALLEL=8
WEIGHT_TYPE=s8
POST_WEIGHT_TYPE=s8
OUTPUT_DIR=../gemma-3-270m-it-${CHIP}-C128-P1024-CTX2047
MAX_CONTEXT is passed to pulsar2 llm_build2 --max_context. The actual maximum
KV cache length is MAX_CONTEXT - 1, so the default package name uses CTX2047.
INPUT_DIR defaults to ${REPO_ROOT}/../models/gemma-3-270m-it, where REPO_ROOT
is the repository root. The model package name is derived from INPUT_DIR by
default.
Override any value through environment variables:
PREFILL_LEN=1152 MAX_CONTEXT=2048 PARALLEL=4 bash build_all.shAX630C build:
CHIP=AX630C bash build_all.shThe current pulsar2 llm_build2 parser does not accept AX630C directly. The
build script maps CHIP=AX630C to --chip AX620E and keeps AX630C in the
package name.
build_all.sh runs:
pulsar2 llm_build2to compile decoder/prefill/post axmodels.prepare_axllm_package.pyto export:model.embed_tokens.weight.bfloat16.bintokenizer.txtconfig.jsonpost_config.json
If axmodels were already compiled:
cd gemma-3-270m.axera/model_convert
OUTPUT_DIR=/path/to/gemma-3-270m-it-${CHIP}-C128-P1024-CTX2047 bash prepare_package.shUse STRICT_AXMODELS=1 to fail if expected axmodel files are missing.
After copying the generated model package to the target device:
axllm run /path/to/gemma-3-270m-it-${CHIP}-C128-P1024-CTX2047or:
axllm serve /path/to/gemma-3-270m-it-${CHIP}-C128-P1024-CTX2047 --port 8000The generated config.json sets:
{
"tokenizer_type": "Gemma3",
"sliding_window": 512,
"layer_types": ["sliding_attention", "..."]
}Gemma3 uses the Gemma turn-style chat template from chat_template.jinja.
layer_types and sliding_window are required so ax-llm can build the correct
full/sliding attention masks. RoPE local/global differences are handled in the
compiled axmodels by npu-codebase/yasched/llm_builder/gemma_test.py.
The conversions in this workspace generated:
gemma-3-270m.axera/gemma-3-270m-it-AX650-C128-P1024-CTX2047/
gemma-3-270m.axera/gemma-3-270m-it-AX630C-C128-P1024-CTX2047/
The package contains 18 layer axmodels, gemma3_text_post.axmodel, the BF16
embedding table, tokenizer.txt, config.json, and post_config.json.
gemma-3-270m-itis instruction tuned, but it is still a very small 270M model. Keep prompts and expected answer quality modest.tools/llm_smokein the currentax-llmbranch does not parselayer_types/sliding_window; preferaxllm runoraxllm servefor Gemma3 sliding-attention validation unless that smoke tool is updated.