Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
206 changes: 206 additions & 0 deletions google-translategemma-4b-it/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
# Gemma Terms of Use

The terms below apply to Gemma models listed in the Appendix at bottom of this page. For Gemma 4 terms, see the [Gemma 4 license](https://ai.google.dev/gemma/apache_2).

Last modified: April 1, 2026

By using, reproducing, modifying, distributing, performing or displaying any
portion or element of Gemma, Model Derivatives including via any Hosted Service,
(each as defined below) (collectively, the "**Gemma Services**") or otherwise
accepting the terms of this Agreement, you agree to be bound by this Agreement.

## Section 1: DEFINITIONS

### 1.1 Definitions

(a) "**Agreement** " or "**Gemma Terms of Use**" means these terms and conditions
that govern the use, reproduction, Distribution or modification of the Gemma
Services and any terms and conditions incorporated by reference.

(b) "**Distribution** " or "**Distribute** " means any transmission, publication,
or other sharing of Gemma or Model Derivatives to a third party, including by
providing or making Gemma or its functionality available as a hosted service via
API, web access, or any other electronic or remote means ("**Hosted Service**").

(c) "**Gemma** " means the set of machine learning language models, trained model
weights and parameters identified in the [Appendix](https://ai.google.dev/gemma/terms#appendix),
regardless of the source that you obtained it from.

(d) "**Google**" means Google LLC.

(e) "**Model Derivatives**" means all (i) modifications to Gemma, (ii) works based
on Gemma, or (iii) any other machine learning model which is created by transfer
of patterns of the weights, parameters, operations, or Output of Gemma, to that
model in order to cause that model to perform similarly to Gemma, including
distillation methods that use intermediate data representations or methods based
on the generation of synthetic data Outputs by Gemma for training that model.
For clarity, Outputs are not deemed Model Derivatives.

(f) "**Output**" means the information content output of Gemma or a Model
Derivative that results from operating or otherwise using Gemma or the Model
Derivative, including via a Hosted Service.

### 1.2

As used in this Agreement, "**including** " means
"**including without limitation**".

## Section 2: ELIGIBILITY AND USAGE

### 2.1 Eligibility

You represent and warrant that you have the legal capacity to enter into this
Agreement (including being of sufficient age of consent). If you are accessing
or using any of the Gemma Services for or on behalf of a legal entity, (a) you
are entering into this Agreement on behalf of yourself and that legal entity,
(b) you represent and warrant that you have the authority to act on behalf of
and bind that entity to this Agreement and (c) references to "**you** " or
"**your**" in the remainder of this Agreement refers to both you (as an
individual) and that entity.

### 2.2 Use

You may use, reproduce, modify, Distribute, perform or display any of the Gemma
Services only in accordance with the terms of this Agreement, and must not
violate (or encourage or permit anyone else to violate) any term of this
Agreement.

## Section 3: DISTRIBUTION AND RESTRICTIONS

### 3.1 Distribution and Redistribution

You may reproduce or Distribute copies of Gemma or Model Derivatives if you meet
all of the following conditions:

1. You must include the use restrictions referenced in Section 3.2 as an enforceable provision in any agreement (e.g., license agreement, terms of use, etc.) governing the use and/or distribution of Gemma or Model Derivatives and you must provide notice to subsequent users you Distribute to that Gemma or Model Derivatives are subject to the use restrictions in Section 3.2.
2. You must provide all third party recipients of Gemma or Model Derivatives a copy of this Agreement.
3. You must cause any modified files to carry prominent notices stating that you modified the files.
4. All Distributions (other than through a Hosted Service) must be accompanied by a "**Notice** " text file that contains the following notice: "**Gemma is provided under and subject to the Gemma Terms of Use found at ai.google.dev/gemma/terms**".

You may add your own intellectual property statement to your modifications and,
except as set forth in this Section, may provide additional or different terms
and conditions for use, reproduction, or Distribution of your modifications, or
for any such Model Derivatives as a whole, provided your use, reproduction,
modification, Distribution, performance, and display of Gemma otherwise complies
with the terms and conditions of this Agreement. Any additional or different
terms and conditions you impose must not conflict with the terms of this
Agreement.

### 3.2 Use Restrictions

You must not use any of the Gemma Services:

1. for the restricted uses set forth in the Gemma Prohibited Use Policy at [ai.google.dev/gemma/prohibited_use_policy](https://ai.google.dev/gemma/prohibited_use_policy) ("**Prohibited Use Policy**"), which is hereby incorporated by reference into this Agreement; or
2. in violation of applicable laws and regulations.

To the maximum extent permitted by law, Google reserves the right to restrict
(remotely or otherwise) usage of any of the Gemma Services that Google
reasonably believes are in violation of this Agreement.

### 3.3 Generated Output

Google claims no rights in Outputs you generate using Gemma. You and your users
are solely responsible for Outputs and their subsequent uses.

## Section 4: ADDITIONAL PROVISIONS

### 4.1 Updates

Google may update Gemma from time to time.

### 4.2 Trademarks

Nothing in this Agreement grants you any rights to use Google's trademarks,
trade names, logos or to otherwise suggest endorsement or misrepresent the
relationship between you and Google. Google reserves any rights not expressly
granted herein.

### 4.3 DISCLAIMER OF WARRANTY

UNLESS REQUIRED BY APPLICABLE LAW, THE GEMMA SERVICES, AND OUTPUTS, ARE PROVIDED
ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING ANY WARRANTIES OR CONDITIONS OF TITLE,
NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE
SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING, REPRODUCING,
MODIFYING, PERFORMING, DISPLAYING OR DISTRIBUTING ANY OF THE GEMMA SERVICES
OR OUTPUTS AND ASSUME ANY AND ALL RISKS ASSOCIATED WITH YOUR USE OR DISTRIBUTION
OF ANY OF THE GEMMA SERVICES OR OUTPUTS AND YOUR EXERCISE OF RIGHTS AND
PERMISSIONS UNDER THIS AGREEMENT.

### 4.4 LIMITATION OF LIABILITY

TO THE FULLEST EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO
LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), PRODUCT LIABILITY,
CONTRACT, OR OTHERWISE, UNLESS REQUIRED BY APPLICABLE LAW, SHALL GOOGLE OR ITS
AFFILIATES BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT,
SPECIAL, INCIDENTAL, EXEMPLARY, CONSEQUENTIAL, OR PUNITIVE DAMAGES, OR LOST
PROFITS OF ANY KIND ARISING FROM THIS AGREEMENT OR RELATED TO, ANY OF THE GEMMA
SERVICES OR OUTPUTS EVEN IF GOOGLE OR ITS AFFILIATES HAVE BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.

### 4.5 Term, Termination, and Survival

The term of this Agreement will commence upon your acceptance of this Agreement
(including acceptance by your use, modification, or Distribution, reproduction,
performance or display of any portion or element of the Gemma Services) and will
continue in full force and effect until terminated in accordance with the terms
of this Agreement. Google may terminate this Agreement if you are in breach of
any term of this Agreement. Upon termination of this Agreement, you must delete
and cease use and Distribution of all copies of Gemma and Model Derivatives in
your possession or control. Sections 1, 2.1, 3.3, 4.2 to 4.9 shall survive the
termination of this Agreement.

### 4.6 Governing Law and Jurisdiction

This Agreement will be governed by the laws of the State of California without
regard to choice of law principles. The UN Convention on Contracts for the
International Sale of Goods does not apply to this Agreement. The state and
federal courts of Santa Clara County, California shall have exclusive
jurisdiction of any dispute arising out of this Agreement.

### 4.7 Severability

If any provision of this Agreement is held to be invalid, illegal or
unenforceable, the remaining provisions shall be unaffected thereby and remain
valid as if such provision had not been set forth herein.

### 4.8 Entire Agreement

This Agreement states all the terms agreed between the parties and supersedes
all other agreements between the parties as of the date of acceptance relating
to its subject matter.

### 4.9 No Waiver

Google will not be treated as having waived any rights by not exercising (or
delaying the exercise of) any rights under this Agreement.

## Appendix

- [Gemma 1](https://ai.google.dev/gemma/docs/core/model_card)
- [Gemma 1.1](https://ai.google.dev/gemma/docs/core/model_card)
- [Gemma 2](https://ai.google.dev/gemma/docs/core/model_card_2)
- [Gemma 3](https://ai.google.dev/gemma/docs/core/model_card_3)
- [Gemma 3n](https://ai.google.dev/gemma/docs/3n)
- [FunctionGemma](https://ai.google.dev/gemma/docs/functiongemma)
- [EmbeddingGemma](https://ai.google.dev/gemma/docs/embeddinggemma)
- [PaliGemma](https://ai.google.dev/gemma/docs/paligemma/model-card)
- [PaliGemma 2](https://ai.google.dev/gemma/docs/paligemma/model-card-2)
- [ShieldGemma](https://ai.google.dev/gemma/docs/shieldgemma/model_card)
- [ShieldGemma 2](https://ai.google.dev/gemma/docs/shieldgemma/model_card_2)
- [CodeGemma](https://ai.google.dev/gemma/docs/codegemma/model_card)
- [CodeGemma 1.1](https://ai.google.dev/gemma/docs/codegemma/model_card)
- [Gemma 2 JPN](https://huggingface.co/google/gemma-2-2b-jpn-it)
- [DataGemma RIG](https://www.kaggle.com/models/google/datagemma-rig)
- [DataGemma RAG](https://www.kaggle.com/models/google/datagemma-rag)
- [RecurrentGemma](https://ai.google.dev/gemma/docs/recurrentgemma/model_card)
- [Gemma Scope](https://ai.google.dev/gemma/docs/gemma_scope)
- [Gemma-APS](https://ai.google.dev/gemma/docs/gemma-aps)
- [T5Gemma](https://www.kaggle.com/models/google/t5gemma)
- [VaultGemma](https://www.kaggle.com/models/google/vaultgemma)
- [FunctionGemma](https://www.kaggle.com/models/google/functiongemma)
- [T5Gemma 2](https://www.kaggle.com/models/google/t5gemma-2)
- [TranslateGemma](https://www.kaggle.com/models/google/translategemma)

> [!NOTE]
> **Note:** Previous versions of these Terms are [archived here](https://ai.google.dev/gemma/terms-archive).
147 changes: 147 additions & 0 deletions google-translategemma-4b-it/builtin/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# TranslateGemma-4B-IT ONNX Recipe
Comment thread
tanzeel-amd marked this conversation as resolved.

Export and run [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) as a full vision-language model (VLM) with ONNX Runtime GenAI.

TranslateGemma is a translation model supporting 55 languages, with both **text-to-text** and **image-to-text** translation capabilities. This recipe exports it as three separate ONNX sub-models (text decoder, vision encoder, embedding) that run together through the ORT GenAI multimodal pipeline.

## Prerequisites

```bash
pip install olive-ai onnxruntime-genai transformers torch
```

## Quick Start

### 1. Authenticate with Hugging Face

TranslateGemma is a [gated model](https://huggingface.co/google/translategemma-4b-it). Accept the license on Hugging Face, then log in so models download automatically during export:

```bash
huggingface-cli login
```

### 2. Export to ONNX

```bash
# INT4 RTN text decoder + FP32 vision/embedding (recommended, ~6.7 GB total)
python optimize.py --config-dir cpu_and_mobile

# Full FP32 baseline (~19.2 GB total)
python optimize.py --config-dir cpu_and_mobile_fp32
```

### 3. Run inference

```bash
# Text translation (default: cpu_and_mobile)
python inference.py --source-lang en --target-lang es --text "Hello, how are you?"

# Image translation
python inference.py --source-lang en --target-lang fr --image <image-path>

# Use FP32 model
python inference.py --model-dir cpu_and_mobile_fp32/models --source-lang en --target-lang ja --text "Good morning"
```

## Export Configurations

Two export configurations are provided, both producing the same three-ONNX-model VLM layout:

| Config | Text Decoder | Embedding | Vision | Total Size |
|---|---|---|---|---|
| `cpu_and_mobile` | INT4 RTN (block 128) | FP32 | FP32 | ~6.7 GB |
| `cpu_and_mobile_fp32` | FP32 | FP32 | FP32 | ~19.2 GB |

Each produces a `models/` directory containing:

```
models/
text.onnx # Text decoder (34 Gemma3 layers + LM head)
text.onnx.data # External weights
embedding.onnx # Token embedding + image feature scattering
embedding.onnx.data
vision.onnx # SigLIP vision encoder + multimodal projector
vision.onnx.data
genai_config.json # Runtime config for ORT GenAI
processor_config.json # Image preprocessing pipeline
tokenizer.json # Tokenizer files
tokenizer_config.json
```

## Architecture

TranslateGemma is a `Gemma3ForConditionalGeneration` multimodal model with three components:

```
Image [B, 3, 896, 896]
|
v
vision.onnx (SigLIP 27 layers + AvgPool2d projector)
|
v image_features [B*256, 2560]
|
+--- input_ids [B, seq_len] ---> embedding.onnx (embed_tokens + scatter)
|
v inputs_embeds [B, seq_len, 2560]
|
+---> text.onnx (34 Gemma3 decoder layers)
|
v logits -> tokens -> translation
```

- **Vision**: SigLIP encoder (27 layers, 1152-dim) processes 896x896 images into 4096 patches, then a projector (AvgPool2d + RMSNorm + linear) compresses to 256 tokens at 2560-dim.
- **Embedding**: Looks up token embeddings (scaled by sqrt(2560)), then scatters vision features into image-token positions.
- **Text**: Standard Gemma3 decoder with 34 layers, sliding/full attention pattern, generating translation tokens autoregressively.

## Supported Languages

TranslateGemma supports translation across 55 languages including: Arabic, Bengali, Bulgarian, Catalan, Chinese (Simplified/Traditional), Czech, Danish, Dutch, English, Estonian, Farsi, Filipino, Finnish, French, German, Greek, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Marathi, Norwegian, Pashto, Polish, Portuguese (BR/PT), Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Zulu.

## Benchmarking

Evaluate translation quality against WMT24++ using COMET:

```bash
pip install unbabel-comet datasets

# Quick check: 3 language pairs, 50 segments each (~10 min CPU)
python benchmark_wmt24pp.py --lang-pairs en-de_DE en-es_MX en-fr_FR --max-segments 50

# Full benchmark: all 55 pairs, stratified 150 segments each (~6h CPU)
python benchmark_wmt24pp.py --lang-pairs all --max-segments 150 --seed 42

# Compare FP32 vs INT4
python benchmark_wmt24pp.py --model-dir cpu_and_mobile_fp32/models --output fp32_results.json
```

Model card reported scores (4B, WMT24++ 55 langs):
- MetricX: 5.32 (lower is better)
- COMET: 81.6 (higher is better)

## File Structure

```
google-translategemma-4b-it/
LICENSE
builtin/
optimize.py # Export orchestration (3 Olive pipelines + config assembly)
user_script.py # Vision/embedding wrapper modules for Olive export
inference.py # Text and image translation inference
benchmark_wmt24pp.py # WMT24++ COMET evaluation
info.yml # Recipe metadata
README.md
cpu_and_mobile/ # INT4 RTN configs
cpu_and_mobile_fp32/ # FP32 configs
```

Models are downloaded automatically from Hugging Face during export. Ensure you have accepted the [Gemma license](https://huggingface.co/google/translategemma-4b-it) and are logged in via `huggingface-cli login`.

## How It Works

The export pipeline (`optimize.py`) runs three Olive workflows sequentially:

1. **Text decoder** via `ModelBuilder` pass -- reads the PyTorch model and constructs an optimized ONNX graph with KV-cache support. For INT4 variants, weights are quantized during this step.
2. **Embedding model** via `OnnxConversion` -- exports a custom `nn.Module` wrapper that combines the token embedding layer with image-feature scattering logic.
3. **Vision model** via `OnnxConversion` -- exports the SigLIP vision tower and multimodal projector as a single ONNX graph.

After export, `optimize.py` patches `genai_config.json` to register all three sub-models and creates `processor_config.json` for the C++ image preprocessing pipeline (resize 896x896, normalize to [-1,1], HWC-to-CHW permute).
Loading
Loading