Skip to content

Add VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B recipe#406

Open
SanjayAMD wants to merge 6 commits into
microsoft:mainfrom
SanjayAMD:amd_vidflash
Open

Add VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B recipe#406
SanjayAMD wants to merge 6 commits into
microsoft:mainfrom
SanjayAMD:amd_vidflash

Conversation

@SanjayAMD
Copy link
Copy Markdown

@SanjayAMD SanjayAMD commented May 8, 2026

Summary

Adds an Olive recipe for OpenGVLab/VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B, a vision-language model that pairs the InternVideo2-1B vision encoder with a Qwen2.5-7B decoder.

Pipeline

The recipe exports three sub-models targeting CPU:

Sub-model Source Precision Notes
vision.onnx InternVideo2-1B vision tower + projector → 64 visual tokens FP32 Two configs are selected at run time and write to the same output: vision_image.json (default, --mode image) takes one 224×224 frame and runs the projector with compress=False; vision.json (--mode video) takes 4 frames and uses ToMe16 (256→16 tokens/frame) for the same 64-token output.
embedding.onnx embed_tokens + visual feature merge FP32 Custom merge replaces masked_scatter with an ONNX-compatible cumsum + where injection at <|image_pad|> positions.
text.onnx Qwen2.5-7B decoder (VideoChat-Flash variant, trust_remote_code) INT4 Quantized via Olive's ModelBuilder.

What's included

  • optimize.py — orchestrates the three Olive runs in isolated subprocesses to release memory between exports (the full HF model is ~15 GB), normalizes Olive's output filenames, and writes the GenAI runtime + processor configs.
  • inference.py — minimal ONNX Runtime GenAI driver with text-only, single-image, and interactive modes.
  • user_script.py — Olive callbacks: model loading, dummy inputs, IO configs.
  • cpu_and_mobile/*.json — Olive configs for the three sub-models (with vision.json and vision_image.json as alternative vision exports).
  • builtin/README.md — prerequisites, usage, and directory layout.
  • info.yml and requirements.txt.

Copilot AI review requested due to automatic review settings May 8, 2026 11:07
@SanjayAMD
Copy link
Copy Markdown
Author

@SanjayAMD please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree [company="AMD"]

@SanjayAMD
Copy link
Copy Markdown
Author

@SanjayAMD the command you issued was incorrect. Please try again.

Examples are:

@microsoft-github-policy-service agree

and

@microsoft-github-policy-service agree company="your company"

@microsoft-github-policy-service agree company="AMD"

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Olive “builtin” recipe to export and run the OpenGVLab/VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B vision-language model with ONNX Runtime GenAI, splitting the pipeline into vision, embedding+merge, and quantized text-decoder sub-models.

Changes:

  • Introduces an export pipeline (optimize.py) that runs three Olive configs (vision / embedding / text) and generates GenAI + processor configs.
  • Adds Olive user callbacks (user_script.py) to extract/patch the vision tower + projector and implement an ONNX-friendly embedding merge.
  • Adds a minimal GenAI inference driver (inference.py), plus recipe metadata and CPU configs under cpu_and_mobile/.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/LICENSE Adds a per-recipe license file (Apache-2.0 text).
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/user_script.py Olive callbacks and PyTorch wrappers for vision export + embedding merge logic.
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/requirements.txt Declares Python dependencies for running export + inference.
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/README.md Documents export and inference usage and directory layout.
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/optimize.py Orchestrates Olive runs, normalizes outputs, and writes GenAI/processor configs.
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/info.yml Adds recipe metadata (currently missing required recipes: list).
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/inference.py Provides a simple onnxruntime-genai inference CLI (text + image + interactive).
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/cpu_and_mobile/embedding.json Olive config for exporting embedding+merge model to ONNX.
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/cpu_and_mobile/text.json Olive config for ModelBuilder INT4 text decoder.
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/cpu_and_mobile/vision.json Olive config for 4-frame vision export + graph surgeries + external data.
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/cpu_and_mobile/vision_image.json Olive config for single-image vision export variant.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/info.yml Outdated
Comment thread OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/optimize.py Outdated
Comment thread OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/optimize.py Outdated
Comment thread OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/user_script.py Outdated
Comment thread OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/user_script.py Outdated
Comment thread OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/inference.py Outdated
@VishalX
Copy link
Copy Markdown

VishalX commented May 11, 2026

@xieofxie / @devang-ml pls review

@xieofxie
Copy link
Copy Markdown
Contributor

please wait for microsoft/onnxruntime-genai#2147

@VishalX
Copy link
Copy Markdown

VishalX commented May 22, 2026

please wait for microsoft/onnxruntime-genai#2147

This is merged now @xieofxie

@xieofxie
Copy link
Copy Markdown
Contributor

please fix pre-commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants