Add VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B recipe by SanjayAMD · Pull Request #406 · microsoft/olive-recipes

SanjayAMD · 2026-05-08T11:07:16Z

Summary

Adds an Olive recipe for OpenGVLab/VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B, a vision-language model that pairs the InternVideo2-1B vision encoder with a Qwen2.5-7B decoder.

Pipeline

The recipe exports three sub-models targeting CPU:

Sub-model	Source	Precision	Notes
`vision.onnx`	InternVideo2-1B vision tower + projector → 64 visual tokens	FP32	Two configs are selected at run time and write to the same output: `vision_image.json` (default, `--mode image`) takes one 224×224 frame and runs the projector with `compress=False`; `vision.json` (`--mode video`) takes 4 frames and uses ToMe16 (256→16 tokens/frame) for the same 64-token output.
`embedding.onnx`	`embed_tokens` + visual feature merge	FP32	Custom merge replaces `masked_scatter` with an ONNX-compatible `cumsum + where` injection at `<\|image_pad\|>` positions.
`text.onnx`	Qwen2.5-7B decoder (VideoChat-Flash variant, `trust_remote_code`)	INT4	Quantized via Olive's `ModelBuilder`.

What's included

optimize.py — orchestrates the three Olive runs in isolated subprocesses to release memory between exports (the full HF model is ~15 GB), normalizes Olive's output filenames, and writes the GenAI runtime + processor configs.
inference.py — minimal ONNX Runtime GenAI driver with text-only, single-image, and interactive modes.
user_script.py — Olive callbacks: model loading, dummy inputs, IO configs.
cpu_and_mobile/*.json — Olive configs for the three sub-models (with vision.json and vision_image.json as alternative vision exports).
builtin/README.md — prerequisites, usage, and directory layout.
info.yml and requirements.txt.

SanjayAMD · 2026-05-08T11:10:35Z

@SanjayAMD please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
@microsoft-github-policy-service agree [company="{your company}"]
Options:

(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"
Contributor License Agreement

@microsoft-github-policy-service agree [company="AMD"]

SanjayAMD · 2026-05-08T11:11:17Z

@SanjayAMD the command you issued was incorrect. Please try again.

Examples are:
@microsoft-github-policy-service agree
and
@microsoft-github-policy-service agree company="your company"

@microsoft-github-policy-service agree company="AMD"

Copilot

Pull request overview

Adds a new Olive “builtin” recipe to export and run the OpenGVLab/VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B vision-language model with ONNX Runtime GenAI, splitting the pipeline into vision, embedding+merge, and quantized text-decoder sub-models.

Changes:

Introduces an export pipeline (optimize.py) that runs three Olive configs (vision / embedding / text) and generates GenAI + processor configs.
Adds Olive user callbacks (user_script.py) to extract/patch the vision tower + projector and implement an ONNX-friendly embedding merge.
Adds a minimal GenAI inference driver (inference.py), plus recipe metadata and CPU configs under cpu_and_mobile/.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/LICENSE	Adds a per-recipe license file (Apache-2.0 text).
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/user_script.py	Olive callbacks and PyTorch wrappers for vision export + embedding merge logic.
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/requirements.txt	Declares Python dependencies for running export + inference.
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/README.md	Documents export and inference usage and directory layout.
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/optimize.py	Orchestrates Olive runs, normalizes outputs, and writes GenAI/processor configs.
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/info.yml	Adds recipe metadata (currently missing required `recipes:` list).
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/inference.py	Provides a simple onnxruntime-genai inference CLI (text + image + interactive).
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/cpu_and_mobile/embedding.json	Olive config for exporting embedding+merge model to ONNX.
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/cpu_and_mobile/text.json	Olive config for ModelBuilder INT4 text decoder.
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/cpu_and_mobile/vision.json	Olive config for 4-frame vision export + graph surgeries + external data.
OpenGVLab-VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B/builtin/cpu_and_mobile/vision_image.json	Olive config for single-image vision export variant.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

VishalX · 2026-05-11T10:06:25Z

@xieofxie / @devang-ml pls review

xieofxie · 2026-05-12T01:43:57Z

please wait for microsoft/onnxruntime-genai#2147

VishalX · 2026-05-22T23:19:26Z

please wait for microsoft/onnxruntime-genai#2147

This is merged now @xieofxie

xieofxie · 2026-05-25T02:20:21Z

please fix pre-commit

Copilot AI review requested due to automatic review settings May 8, 2026 11:07

Copilot started reviewing on behalf of SanjayAMD May 8, 2026 11:08 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

VishalX mentioned this pull request May 8, 2026

Add VideoChat-Flash (OpenGVLab) language model support microsoft/onnxruntime-genai#2147

Merged

4 tasks

SanjayAMD and others added 3 commits May 19, 2026 03:50

Add VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B recipe

85a2d2e

Fix info.yml schema, drop broken --models-dir flag, fix docstring typo

5443097

Address review: fix info.yml schema, plumb --models-dir, fix docstring

00c4ada

SanjayAMD force-pushed the amd_vidflash branch from 5e508e9 to 00c4ada Compare May 19, 2026 08:50

Update VideoChat Flash builtin Olive pipeline and inference scripts

5b8dae7

Linter Fix

a8b3531

SanjayAMD force-pushed the amd_vidflash branch from f33f520 to a8b3531 Compare May 25, 2026 06:17

Merge branch 'main' into amd_vidflash

2ef34cf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B recipe#406

Add VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B recipe#406
SanjayAMD wants to merge 6 commits into
microsoft:mainfrom
SanjayAMD:amd_vidflash

SanjayAMD commented May 8, 2026 •

edited

Loading

Uh oh!

SanjayAMD commented May 8, 2026

Uh oh!

SanjayAMD commented May 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VishalX commented May 11, 2026

Uh oh!

xieofxie commented May 12, 2026

Uh oh!

VishalX commented May 22, 2026

Uh oh!

xieofxie commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

SanjayAMD commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Pipeline

What's included

Uh oh!

SanjayAMD commented May 8, 2026

Uh oh!

SanjayAMD commented May 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VishalX commented May 11, 2026

Uh oh!

xieofxie commented May 12, 2026

Uh oh!

VishalX commented May 22, 2026

Uh oh!

xieofxie commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SanjayAMD commented May 8, 2026 •

edited

Loading