Add VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B recipe#406
Conversation
@microsoft-github-policy-service agree [company="AMD"] |
@microsoft-github-policy-service agree company="AMD" |
There was a problem hiding this comment.
Pull request overview
Adds a new Olive “builtin” recipe to export and run the OpenGVLab/VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B vision-language model with ONNX Runtime GenAI, splitting the pipeline into vision, embedding+merge, and quantized text-decoder sub-models.
Changes:
- Introduces an export pipeline (
optimize.py) that runs three Olive configs (vision / embedding / text) and generates GenAI + processor configs. - Adds Olive user callbacks (
user_script.py) to extract/patch the vision tower + projector and implement an ONNX-friendly embedding merge. - Adds a minimal GenAI inference driver (
inference.py), plus recipe metadata and CPU configs undercpu_and_mobile/.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/LICENSE | Adds a per-recipe license file (Apache-2.0 text). |
| OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/user_script.py | Olive callbacks and PyTorch wrappers for vision export + embedding merge logic. |
| OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/requirements.txt | Declares Python dependencies for running export + inference. |
| OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/README.md | Documents export and inference usage and directory layout. |
| OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/optimize.py | Orchestrates Olive runs, normalizes outputs, and writes GenAI/processor configs. |
| OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/info.yml | Adds recipe metadata (currently missing required recipes: list). |
| OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/inference.py | Provides a simple onnxruntime-genai inference CLI (text + image + interactive). |
| OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/cpu_and_mobile/embedding.json | Olive config for exporting embedding+merge model to ONNX. |
| OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/cpu_and_mobile/text.json | Olive config for ModelBuilder INT4 text decoder. |
| OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/cpu_and_mobile/vision.json | Olive config for 4-frame vision export + graph surgeries + external data. |
| OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/cpu_and_mobile/vision_image.json | Olive config for single-image vision export variant. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@xieofxie / @devang-ml pls review |
|
please wait for microsoft/onnxruntime-genai#2147 |
This is merged now @xieofxie |
|
please fix pre-commit |
Summary
Adds an Olive recipe for
OpenGVLab/VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B, a vision-language model that pairs the InternVideo2-1B vision encoder with a Qwen2.5-7B decoder.Pipeline
The recipe exports three sub-models targeting CPU:
vision.onnxvision_image.json(default,--mode image) takes one 224×224 frame and runs the projector withcompress=False;vision.json(--mode video) takes 4 frames and uses ToMe16 (256→16 tokens/frame) for the same 64-token output.embedding.onnxembed_tokens+ visual feature mergemasked_scatterwith an ONNX-compatiblecumsum + whereinjection at<|image_pad|>positions.text.onnxtrust_remote_code)ModelBuilder.What's included
optimize.py— orchestrates the three Olive runs in isolated subprocesses to release memory between exports (the full HF model is ~15 GB), normalizes Olive's output filenames, and writes the GenAI runtime + processor configs.inference.py— minimal ONNX Runtime GenAI driver with text-only, single-image, and interactive modes.user_script.py— Olive callbacks: model loading, dummy inputs, IO configs.cpu_and_mobile/*.json— Olive configs for the three sub-models (withvision.jsonandvision_image.jsonas alternative vision exports).builtin/README.md— prerequisites, usage, and directory layout.info.ymlandrequirements.txt.