Add Fara-7b recipes#384
Open
apsonawane wants to merge 10 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds a new microsoft-Fara-7B/builtin recipe bundle to export/optimize the Fara-7B vision-language model to ONNX (via Olive) and run it with ONNX Runtime GenAI, including CPU and CUDA pipelines.
Changes:
- Added Olive pipeline JSONs for embedding / vision / text sub-model export + optimization for
cpu_and_mobile/andcuda/. - Added Python orchestration and runtime scripts (
optimize.py,inference.py,user_script.py) plus model code undercodes/. - Added supporting metadata/docs (
README.md,info.yml) and local ignore rules (.gitignore).
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| microsoft-Fara-7B/builtin/user_script.py | Olive callbacks for loading the custom VL model + IO/dummy input definitions for export. |
| microsoft-Fara-7B/builtin/requirements.txt | Python dependencies to run Olive export/optimization and scripts. |
| microsoft-Fara-7B/builtin/optimize.py | Runs the three Olive configs and patches/writes GenAI runtime config files. |
| microsoft-Fara-7B/builtin/info.yml | Minimal metadata for the builtin recipe directory. |
| microsoft-Fara-7B/builtin/inference.py | Example ONNX Runtime GenAI inference script for text-only and image+text prompts. |
| microsoft-Fara-7B/builtin/cuda/embedding.json | CUDA Olive pipeline for exporting/optimizing embedding sub-model. |
| microsoft-Fara-7B/builtin/cuda/text.json | CUDA Olive pipeline for producing INT4 text decoder via ModelBuilder. |
| microsoft-Fara-7B/builtin/cuda/vision.json | CUDA Olive pipeline for exporting/optimizing vision encoder sub-model. |
| microsoft-Fara-7B/builtin/cpu_and_mobile/embedding.json | CPU/mobile Olive pipeline for exporting/quantizing embedding sub-model. |
| microsoft-Fara-7B/builtin/cpu_and_mobile/text.json | CPU/mobile Olive pipeline for producing INT4 text decoder via ModelBuilder. |
| microsoft-Fara-7B/builtin/cpu_and_mobile/vision.json | CPU/mobile Olive pipeline for exporting/quantizing vision encoder sub-model. |
| microsoft-Fara-7B/builtin/codes/modeling_qwen2_5_vl.py | Custom Qwen2.5-VL-derived PyTorch modeling code to enable ONNX export. |
| microsoft-Fara-7B/builtin/codes/init.py | Package marker for codes/. |
| microsoft-Fara-7B/builtin/README.md | End-to-end instructions and usage examples for export + inference. |
| microsoft-Fara-7B/builtin/.gitignore | Ignores generated artifacts and caches for the builtin workflow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Contributor
|
Please add LICENSE file. |
devang-ml
reviewed
May 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a complete ONNX Runtime GenAI example for the Fara-7B vision-language model, including documentation, configuration files for model export and optimization (for both CPU and CUDA), a Python inference script, and supporting metadata. The changes enable users to export, optimize, quantize, and run inference with Fara-7B using ONNX Runtime GenAI, supporting both text and image inputs.
Key changes:
1. Documentation and Metadata
README.mddescribing the Fara-7B ONNX Runtime GenAI pipeline, setup instructions, usage examples, and directory structure.info.ymlwith metadata such as supported execution providers, devices, and keywords for discoverability.2. Model Export and Optimization Pipelines
cpu_and_mobile/embedding.json,cpu_and_mobile/vision.json,cpu_and_mobile/text.json) and CUDA (cuda/embedding.json,cuda/vision.json,cuda/text.json) pipelines, specifying model export, graph surgeries, optimizations, and quantization/precision steps for each sub-model (vision encoder, embedding, text decoder). [1] [2] [3] [4] [5] [6]3. Inference Script
inference.py, a Python script for running text or multimodal (image+text) inference with ONNX Runtime GenAI, supporting both batch and interactive modes.4. Project Structure and Ignore Rules
.gitignoreto exclude generated models, cache, Python bytecode, and log files.These changes together provide an end-to-end workflow for exporting, optimizing, quantizing, and running inference on the Fara-7B model with ONNX Runtime GenAI, making it easy for users to deploy and test the model on both CPU and GPU platforms.