microsoft · devang-ml · May 26, 2026 · Apr 21, 2026 · Apr 22, 2026 · Apr 22, 2026
diff --git a/google-translategemma-4b-it/LICENSE b/google-translategemma-4b-it/LICENSE
@@ -0,0 +1,206 @@
+# Gemma Terms of Use
+
+The terms below apply to Gemma models listed in the Appendix at bottom of this page. For Gemma 4 terms, see the [Gemma 4 license](https://ai.google.dev/gemma/apache_2).
+
+Last modified: April 1, 2026
+
+By using, reproducing, modifying, distributing, performing or displaying any
+portion or element of Gemma, Model Derivatives including via any Hosted Service,
+(each as defined below) (collectively, the "**Gemma Services**") or otherwise
+accepting the terms of this Agreement, you agree to be bound by this Agreement.
+
+## Section 1: DEFINITIONS
+
+### 1.1 Definitions
+
+(a) "**Agreement** " or "**Gemma Terms of Use**" means these terms and conditions
+that govern the use, reproduction, Distribution or modification of the Gemma
+Services and any terms and conditions incorporated by reference.
+
+(b) "**Distribution** " or "**Distribute** " means any transmission, publication,
+or other sharing of Gemma or Model Derivatives to a third party, including by
+providing or making Gemma or its functionality available as a hosted service via
+API, web access, or any other electronic or remote means ("**Hosted Service**").
+
+(c) "**Gemma** " means the set of machine learning language models, trained model
+weights and parameters identified in the [Appendix](https://ai.google.dev/gemma/terms#appendix),
+regardless of the source that you obtained it from.
+
+(d) "**Google**" means Google LLC.
+
+(e) "**Model Derivatives**" means all (i) modifications to Gemma, (ii) works based
+on Gemma, or (iii) any other machine learning model which is created by transfer
+of patterns of the weights, parameters, operations, or Output of Gemma, to that
+model in order to cause that model to perform similarly to Gemma, including
+distillation methods that use intermediate data representations or methods based
+on the generation of synthetic data Outputs by Gemma for training that model.
+For clarity, Outputs are not deemed Model Derivatives.
+
+(f) "**Output**" means the information content output of Gemma or a Model
+Derivative that results from operating or otherwise using Gemma or the Model
+Derivative, including via a Hosted Service.
+
+### 1.2
+
+As used in this Agreement, "**including** " means
+"**including without limitation**".
+
+## Section 2: ELIGIBILITY AND USAGE
+
+### 2.1 Eligibility
+
+You represent and warrant that you have the legal capacity to enter into this
+Agreement (including being of sufficient age of consent). If you are accessing
+or using any of the Gemma Services for or on behalf of a legal entity, (a) you
+are entering into this Agreement on behalf of yourself and that legal entity,
+(b) you represent and warrant that you have the authority to act on behalf of
+and bind that entity to this Agreement and (c) references to "**you** " or
+"**your**" in the remainder of this Agreement refers to both you (as an
+individual) and that entity.
+
+### 2.2 Use
+
+You may use, reproduce, modify, Distribute, perform or display any of the Gemma
+Services only in accordance with the terms of this Agreement, and must not
+violate (or encourage or permit anyone else to violate) any term of this
+Agreement.
+
+## Section 3: DISTRIBUTION AND RESTRICTIONS
+
+### 3.1 Distribution and Redistribution
+
+You may reproduce or Distribute copies of Gemma or Model Derivatives if you meet
+all of the following conditions:
+
+1. You must include the use restrictions referenced in Section 3.2 as an enforceable provision in any agreement (e.g., license agreement, terms of use, etc.) governing the use and/or distribution of Gemma or Model Derivatives and you must provide notice to subsequent users you Distribute to that Gemma or Model Derivatives are subject to the use restrictions in Section 3.2.
+2. You must provide all third party recipients of Gemma or Model Derivatives a copy of this Agreement.
+3. You must cause any modified files to carry prominent notices stating that you modified the files.
+4. All Distributions (other than through a Hosted Service) must be accompanied by a "**Notice** " text file that contains the following notice: "**Gemma is provided under and subject to the Gemma Terms of Use found at ai.google.dev/gemma/terms**".
+
+You may add your own intellectual property statement to your modifications and,
+except as set forth in this Section, may provide additional or different terms
+and conditions for use, reproduction, or Distribution of your modifications, or
+for any such Model Derivatives as a whole, provided your use, reproduction,
+modification, Distribution, performance, and display of Gemma otherwise complies
+with the terms and conditions of this Agreement. Any additional or different
+terms and conditions you impose must not conflict with the terms of this
+Agreement.
+
+### 3.2 Use Restrictions
+
+You must not use any of the Gemma Services:
+
+1. for the restricted uses set forth in the Gemma Prohibited Use Policy at [ai.google.dev/gemma/prohibited_use_policy](https://ai.google.dev/gemma/prohibited_use_policy) ("**Prohibited Use Policy**"), which is hereby incorporated by reference into this Agreement; or
+2. in violation of applicable laws and regulations.
+
+To the maximum extent permitted by law, Google reserves the right to restrict
+(remotely or otherwise) usage of any of the Gemma Services that Google
+reasonably believes are in violation of this Agreement.
+
+### 3.3 Generated Output
+
+Google claims no rights in Outputs you generate using Gemma. You and your users
+are solely responsible for Outputs and their subsequent uses.
+
+## Section 4: ADDITIONAL PROVISIONS
+
+### 4.1 Updates
+
+Google may update Gemma from time to time.
+
+### 4.2 Trademarks
+
+Nothing in this Agreement grants you any rights to use Google's trademarks,
+trade names, logos or to otherwise suggest endorsement or misrepresent the
+relationship between you and Google. Google reserves any rights not expressly
+granted herein.
+
+### 4.3 DISCLAIMER OF WARRANTY
+
+UNLESS REQUIRED BY APPLICABLE LAW, THE GEMMA SERVICES, AND OUTPUTS, ARE PROVIDED
+ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER
+EXPRESS OR IMPLIED, INCLUDING ANY WARRANTIES OR CONDITIONS OF TITLE,
+NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE
+SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING, REPRODUCING,
+MODIFYING, PERFORMING, DISPLAYING OR DISTRIBUTING ANY OF THE GEMMA SERVICES
+OR OUTPUTS AND ASSUME ANY AND ALL RISKS ASSOCIATED WITH YOUR USE OR DISTRIBUTION
+OF ANY OF THE GEMMA SERVICES OR OUTPUTS AND YOUR EXERCISE OF RIGHTS AND
+PERMISSIONS UNDER THIS AGREEMENT.
+
+### 4.4 LIMITATION OF LIABILITY
+
+TO THE FULLEST EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO
+LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), PRODUCT LIABILITY,
+CONTRACT, OR OTHERWISE, UNLESS REQUIRED BY APPLICABLE LAW, SHALL GOOGLE OR ITS
+AFFILIATES BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT,
+SPECIAL, INCIDENTAL, EXEMPLARY, CONSEQUENTIAL, OR PUNITIVE DAMAGES, OR LOST
+PROFITS OF ANY KIND ARISING FROM THIS AGREEMENT OR RELATED TO, ANY OF THE GEMMA
+SERVICES OR OUTPUTS EVEN IF GOOGLE OR ITS AFFILIATES HAVE BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+### 4.5 Term, Termination, and Survival
+
+The term of this Agreement will commence upon your acceptance of this Agreement
+(including acceptance by your use, modification, or Distribution, reproduction,
+performance or display of any portion or element of the Gemma Services) and will
+continue in full force and effect until terminated in accordance with the terms
+of this Agreement. Google may terminate this Agreement if you are in breach of
+any term of this Agreement. Upon termination of this Agreement, you must delete
+and cease use and Distribution of all copies of Gemma and Model Derivatives in
+your possession or control. Sections 1, 2.1, 3.3, 4.2 to 4.9 shall survive the
+termination of this Agreement.
+
+### 4.6 Governing Law and Jurisdiction
+
+This Agreement will be governed by the laws of the State of California without
+regard to choice of law principles. The UN Convention on Contracts for the
+International Sale of Goods does not apply to this Agreement. The state and
+federal courts of Santa Clara County, California shall have exclusive
+jurisdiction of any dispute arising out of this Agreement.
+
+### 4.7 Severability
+
+If any provision of this Agreement is held to be invalid, illegal or
+unenforceable, the remaining provisions shall be unaffected thereby and remain
+valid as if such provision had not been set forth herein.
+
+### 4.8 Entire Agreement
+
+This Agreement states all the terms agreed between the parties and supersedes
+all other agreements between the parties as of the date of acceptance relating
+to its subject matter.
+
+### 4.9 No Waiver
+
+Google will not be treated as having waived any rights by not exercising (or
+delaying the exercise of) any rights under this Agreement.
+
+## Appendix
+
+- [Gemma 1](https://ai.google.dev/gemma/docs/core/model_card)
+- [Gemma 1.1](https://ai.google.dev/gemma/docs/core/model_card)
+- [Gemma 2](https://ai.google.dev/gemma/docs/core/model_card_2)
+- [Gemma 3](https://ai.google.dev/gemma/docs/core/model_card_3)
+- [Gemma 3n](https://ai.google.dev/gemma/docs/3n)
+- [FunctionGemma](https://ai.google.dev/gemma/docs/functiongemma)
+- [EmbeddingGemma](https://ai.google.dev/gemma/docs/embeddinggemma)
+- [PaliGemma](https://ai.google.dev/gemma/docs/paligemma/model-card)
+- [PaliGemma 2](https://ai.google.dev/gemma/docs/paligemma/model-card-2)
+- [ShieldGemma](https://ai.google.dev/gemma/docs/shieldgemma/model_card)
+- [ShieldGemma 2](https://ai.google.dev/gemma/docs/shieldgemma/model_card_2)
+- [CodeGemma](https://ai.google.dev/gemma/docs/codegemma/model_card)
+- [CodeGemma 1.1](https://ai.google.dev/gemma/docs/codegemma/model_card)
+- [Gemma 2 JPN](https://huggingface.co/google/gemma-2-2b-jpn-it)
+- [DataGemma RIG](https://www.kaggle.com/models/google/datagemma-rig)
+- [DataGemma RAG](https://www.kaggle.com/models/google/datagemma-rag)
+- [RecurrentGemma](https://ai.google.dev/gemma/docs/recurrentgemma/model_card)
+- [Gemma Scope](https://ai.google.dev/gemma/docs/gemma_scope)
+- [Gemma-APS](https://ai.google.dev/gemma/docs/gemma-aps)
+- [T5Gemma](https://www.kaggle.com/models/google/t5gemma)
+- [VaultGemma](https://www.kaggle.com/models/google/vaultgemma)
+- [FunctionGemma](https://www.kaggle.com/models/google/functiongemma)
+- [T5Gemma 2](https://www.kaggle.com/models/google/t5gemma-2)
+- [TranslateGemma](https://www.kaggle.com/models/google/translategemma)
+
+> [!NOTE]
+> **Note:** Previous versions of these Terms are [archived here](https://ai.google.dev/gemma/terms-archive).
diff --git a/google-translategemma-4b-it/builtin/README.md b/google-translategemma-4b-it/builtin/README.md
@@ -0,0 +1,147 @@
+# TranslateGemma-4B-IT ONNX Recipe
+
+Export and run [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) as a full vision-language model (VLM) with ONNX Runtime GenAI.
+
+TranslateGemma is a translation model supporting 55 languages, with both **text-to-text** and **image-to-text** translation capabilities. This recipe exports it as three separate ONNX sub-models (text decoder, vision encoder, embedding) that run together through the ORT GenAI multimodal pipeline.
+
+## Prerequisites
+
+```bash
+pip install olive-ai onnxruntime-genai transformers torch
+```
+
+## Quick Start
+
+### 1. Authenticate with Hugging Face
+
+TranslateGemma is a [gated model](https://huggingface.co/google/translategemma-4b-it). Accept the license on Hugging Face, then log in so models download automatically during export:
+
+```bash
+huggingface-cli login
+```
+
+### 2. Export to ONNX
+
+```bash
+# INT4 RTN text decoder + FP32 vision/embedding (recommended, ~6.7 GB total)
+python optimize.py --config-dir cpu_and_mobile
+
+# Full FP32 baseline (~19.2 GB total)
+python optimize.py --config-dir cpu_and_mobile_fp32
+```
+
+### 3. Run inference
+
+```bash
+# Text translation (default: cpu_and_mobile)
+python inference.py --source-lang en --target-lang es --text "Hello, how are you?"
+
+# Image translation
+python inference.py --source-lang en --target-lang fr --image <image-path>
+
+# Use FP32 model
+python inference.py --model-dir cpu_and_mobile_fp32/models --source-lang en --target-lang ja --text "Good morning"
+```
+
+## Export Configurations
+
+Two export configurations are provided, both producing the same three-ONNX-model VLM layout:
+
+| Config | Text Decoder | Embedding | Vision | Total Size |
+|---|---|---|---|---|
+| `cpu_and_mobile` | INT4 RTN (block 128) | FP32 | FP32 | ~6.7 GB |
+| `cpu_and_mobile_fp32` | FP32 | FP32 | FP32 | ~19.2 GB |
+
+Each produces a `models/` directory containing:
+
+```
+models/
+  text.onnx              # Text decoder (34 Gemma3 layers + LM head)
+  text.onnx.data         # External weights
+  embedding.onnx         # Token embedding + image feature scattering
+  embedding.onnx.data
+  vision.onnx            # SigLIP vision encoder + multimodal projector
+  vision.onnx.data
+  genai_config.json      # Runtime config for ORT GenAI
+  processor_config.json  # Image preprocessing pipeline
+  tokenizer.json         # Tokenizer files
+  tokenizer_config.json
+```
+
+## Architecture
+
+TranslateGemma is a `Gemma3ForConditionalGeneration` multimodal model with three components:
+
+```
+Image [B, 3, 896, 896]
+  |
+  v
+vision.onnx (SigLIP 27 layers + AvgPool2d projector)
+  |
+  v  image_features [B*256, 2560]
+  |
+  +--- input_ids [B, seq_len] --->  embedding.onnx (embed_tokens + scatter)
+                                      |
+                                      v  inputs_embeds [B, seq_len, 2560]
+                                      |
+                                      +---> text.onnx (34 Gemma3 decoder layers)
+                                              |
+                                              v  logits -> tokens -> translation
+```
+
+- **Vision**: SigLIP encoder (27 layers, 1152-dim) processes 896x896 images into 4096 patches, then a projector (AvgPool2d + RMSNorm + linear) compresses to 256 tokens at 2560-dim.
+- **Embedding**: Looks up token embeddings (scaled by sqrt(2560)), then scatters vision features into image-token positions.
+- **Text**: Standard Gemma3 decoder with 34 layers, sliding/full attention pattern, generating translation tokens autoregressively.
+
+## Supported Languages
+
+TranslateGemma supports translation across 55 languages including: Arabic, Bengali, Bulgarian, Catalan, Chinese (Simplified/Traditional), Czech, Danish, Dutch, English, Estonian, Farsi, Filipino, Finnish, French, German, Greek, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Marathi, Norwegian, Pashto, Polish, Portuguese (BR/PT), Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Zulu.
+
+## Benchmarking
+
+Evaluate translation quality against WMT24++ using COMET:
+
+```bash
+pip install unbabel-comet datasets
+
+# Quick check: 3 language pairs, 50 segments each (~10 min CPU)
+python benchmark_wmt24pp.py --lang-pairs en-de_DE en-es_MX en-fr_FR --max-segments 50
+
+# Full benchmark: all 55 pairs, stratified 150 segments each (~6h CPU)
+python benchmark_wmt24pp.py --lang-pairs all --max-segments 150 --seed 42
+
+# Compare FP32 vs INT4
+python benchmark_wmt24pp.py --model-dir cpu_and_mobile_fp32/models --output fp32_results.json
+```
+
+Model card reported scores (4B, WMT24++ 55 langs):
+- MetricX: 5.32 (lower is better)
+- COMET: 81.6 (higher is better)
+
+## File Structure
+
+```
+google-translategemma-4b-it/
+  LICENSE
+  builtin/
+    optimize.py                   # Export orchestration (3 Olive pipelines + config assembly)
+    user_script.py                # Vision/embedding wrapper modules for Olive export
+    inference.py                  # Text and image translation inference
+    benchmark_wmt24pp.py          # WMT24++ COMET evaluation
+    info.yml                      # Recipe metadata
+    README.md
+    cpu_and_mobile/               # INT4 RTN configs
+    cpu_and_mobile_fp32/          # FP32 configs
+```
+
+Models are downloaded automatically from Hugging Face during export. Ensure you have accepted the [Gemma license](https://huggingface.co/google/translategemma-4b-it) and are logged in via `huggingface-cli login`.
+
+## How It Works
+
+The export pipeline (`optimize.py`) runs three Olive workflows sequentially:
+
+1. **Text decoder** via `ModelBuilder` pass -- reads the PyTorch model and constructs an optimized ONNX graph with KV-cache support. For INT4 variants, weights are quantized during this step.
+2. **Embedding model** via `OnnxConversion` -- exports a custom `nn.Module` wrapper that combines the token embedding layer with image-feature scattering logic.
+3. **Vision model** via `OnnxConversion` -- exports the SigLIP vision tower and multimodal projector as a single ONNX graph.
+
+After export, `optimize.py` patches `genai_config.json` to register all three sub-models and creates `processor_config.json` for the C++ image preprocessing pipeline (resize 896x896, normalize to [-1,1], HWC-to-CHW permute).