Skip to content

Translate gemma 4B recipes#404

Open
tanzeel-amd wants to merge 9 commits into
microsoft:mainfrom
tanzeel-amd:translate-gemma-4b
Open

Translate gemma 4B recipes#404
tanzeel-amd wants to merge 9 commits into
microsoft:mainfrom
tanzeel-amd:translate-gemma-4b

Conversation

@tanzeel-amd
Copy link
Copy Markdown

@tanzeel-amd tanzeel-amd commented May 8, 2026

Adds a complete Olive recipe to export and run google/translategemma-4b-it as a three-submodel VLM pipeline via ONNX Runtime GenAI. TranslateGemma is a Gemma3-based multimodal translation model supporting 55 languages with both text-to-text and image-to-text translation.

What's included

  • Export pipeline (builtin/optimize.py): Runs three Olive workflows to produce the text decoder (INT4 or FP32 via ModelBuilder), embedding (token embed + image feature scattering), and vision encoder (SigLIP 27-layer + AvgPool2d projector). Patches genai_config.json and generates processor_config.json for the OGA C++ image pipeline.
  • Two export configs: cpu_and_mobile (INT4 RTN, ~6.7 GB) and cpu_and_mobile_fp32 (FP32, ~19.2 GB)
  • Inference script (inference.py): Text and image translation with streaming output via ORT GenAI multimodal processor
  • WMT24++ benchmark (benchmark_wmt24pp.py): Evaluates translation quality across up to 55 language pairs using COMET scoring, with domain-stratified sampling and resume support
  • Custom export wrappers (builtin/user_script.py): Gemma3VisionModel and Gemma3EmbeddingModel wrappers for clean ONNX export of vision and embedding submodels

Copilot AI review requested due to automatic review settings May 8, 2026 10:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Olive recipe for exporting and running the gated google/translategemma-4b-it multimodal translation model (text and image translation) via ONNX Runtime GenAI, including an optional WMT24++ + COMET benchmarking script.

Changes:

  • Added end-to-end export pipeline (builtin/optimize.py) plus Olive JSON configs to export text/vision/embedding ONNX sub-models.
  • Added runtime scripts for inference (inference.py) and WMT24++ benchmarking (benchmark_wmt24pp.py).
  • Added recipe documentation and included the Gemma Terms of Use.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
google-translategemma-4b-it/README.md Documents export/inference/benchmark usage and model architecture.
google-translategemma-4b-it/LICENSE Adds Gemma Terms of Use text to the recipe directory.
google-translategemma-4b-it/inference.py CLI for text/image translation using ORT GenAI multimodal pipeline.
google-translategemma-4b-it/benchmark_wmt24pp.py CLI to run WMT24++ translations and score with COMET, with resume support.
google-translategemma-4b-it/builtin/optimize.py Orchestrates Olive exports and patches GenAI/processor/tokenizer configs.
google-translategemma-4b-it/builtin/user_script.py Provides PyTorch wrapper modules + IO/dummy-input helpers for Olive export.
google-translategemma-4b-it/builtin/cpu_and_mobile/text.json Olive config for INT4 RTN text decoder export.
google-translategemma-4b-it/builtin/cpu_and_mobile/embedding.json Olive config for embedding/scatter ONNX export.
google-translategemma-4b-it/builtin/cpu_and_mobile/vision.json Olive config for vision encoder ONNX export.
google-translategemma-4b-it/builtin/cpu_and_mobile_fp32/text.json Olive config for FP32 text decoder export.
google-translategemma-4b-it/builtin/cpu_and_mobile_fp32/embedding.json Olive config for FP32 embedding/scatter ONNX export.
google-translategemma-4b-it/builtin/cpu_and_mobile_fp32/vision.json Olive config for FP32 vision encoder ONNX export.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread google-translategemma-4b-it/README.md Outdated
Comment thread google-translategemma-4b-it/README.md Outdated
Comment thread google-translategemma-4b-it/benchmark_wmt24pp.py Outdated
Comment thread google-translategemma-4b-it/benchmark_wmt24pp.py Outdated
Comment thread google-translategemma-4b-it/benchmark_wmt24pp.py Outdated
Comment thread google-translategemma-4b-it/inference.py Outdated
Comment thread google-translategemma-4b-it/builtin/user_script.py Outdated
@tanzeel-amd
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree company="AMD"

@VishalX
Copy link
Copy Markdown

VishalX commented May 11, 2026

@xieofxie / @devang-ml pls review

@tanzeel-amd tanzeel-amd force-pushed the translate-gemma-4b branch from 4d43b33 to 6fae78b Compare May 19, 2026 10:08
@VishalX
Copy link
Copy Markdown

VishalX commented May 21, 2026

@xieofxie can you please review this?

Comment thread google-translategemma-4b-it/builtin/README.md
VishalX and others added 9 commits May 22, 2026 05:07
- Introduced benchmark_wmt24pp.py for evaluating translation quality on the WMT24++ dataset using COMET.
- Added inference.py for text and image translation capabilities.
- Created README.md with setup instructions, usage examples, and model architecture details.
- Implemented optimization pipeline in optimize.py for exporting sub-models (text decoder, vision encoder, embedding) with INT4 quantization.
- Developed user script in user_script.py for model integration.
- Configured export settings in JSON files for different model variants (INT4, AWQ, FP32).
- Included test images for demonstration purposes.
- README.md: Remove nonexistent AWQ config references, fix file tree to match actual repo layout (google-translategemma-4b-it/, no data/ dir)

- benchmark_wmt24pp.py: Default --hf-model-dir to HF model ID, remove unused stream param and dead new_pairs variable

- inference.py: Remove unused numpy import

- builtin/user_script.py: Remove unused math, os, glob imports

Co-authored-by: Cursor <cursoragent@cursor.com>
Moved README.md, inference.py, benchmark_wmt24pp.py into builtin/ to follow the common recipe file organization pattern. Added info.yml with recipe metadata. Updated default model paths in inference.py and benchmark_wmt24pp.py.

Co-authored-by: Cursor <cursoragent@cursor.com>
@tanzeel-amd tanzeel-amd force-pushed the translate-gemma-4b branch from 22328c6 to ada0ad0 Compare May 22, 2026 12:07
@tanzeel-amd tanzeel-amd requested a review from xieofxie May 25, 2026 05:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants