From original readme

Gemma is a series of best-in-class open models and draws inspiration and technological lineage from the Gemini family of models. They are text-to-text, decoder-only large language models with open weights. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning.

Gemma-2-JPN is a Gemma 2 2B model fine-tuned on Japanese text. It supports the Japanese language with the same level of performance of English only queries on Gemma 2.

Usage

Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with:

pip install -U transformers

Then, copy the snippet from the section that is relevant for your usecase.

Running with the `pipeline` API

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="google/gemma-2-2b-jpn-it",
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",  # replace with "mps" to run on a Mac device
)

messages = [
    {"role": "user", "content": "マシーンラーニングについての詩を書いてください。"},
]

outputs = pipe(messages, return_full_text=False, max_new_tokens=256)
assistant_response = outputs[0]["generated_text"].strip()
print(assistant_response)

Example output

## マシーンラーニングの詩

**1.** 
データの海、深淵の広がり、
複雑なパターン、隠された知識。
機械学習、その力強さ、
未来を予測、その道を開く。

**2.** 
ニューラルネットワーク、複雑な枝、
学習の旅、その過程は静か。
データから学び、進化する姿、
予測の精度、その力強さ。

**3.** 
教師あり学習、正解を導く、
教師なし学習、未知の世界へ。
機械学習、その進化は止まらない、
未来の扉を開く、新たな時代へ。

**4.** 
画像認識、音声認識、
複雑なタスク、その答えを見つける。
機械学習、その力強さ、
未来の技術、その可能性を語る。

It can also be used for translation, as follows:

translation_input_text = f"Translate the following poem from Japanese to English:\n\n{assistant_response}"
messages = [
    {"role": "user", "content": translation_input_text},
]

outputs = pipe(messages, return_full_text=False, max_new_tokens=1024)
translated_response = outputs[0]["generated_text"].strip()
print(translated_response)

Example output

## A Poem About Machine Learning

**1.**
A vast ocean of data, a deep expanse,
Complex patterns, hidden knowledge.
Machine learning, its strength so vast,
Predicting the future, opening the way.

**2.**
A neural network, with branches intricate,
A journey of learning, its process serene.
Learning from data, evolving in its form,
The precision of prediction, its strength.

**3.**
Supervised learning, guiding the correct answer,
Unsupervised learning, venturing into the unknown.
Machine learning, its evolution never ends,
Opening the doors to the future, a new era.

**4.**
Image recognition, speech recognition,
Complex tasks, finding the answer.
Machine learning, its strength so vast,
The possibilities of future technology, a story to be told.




**Explanation:**

The poem uses vivid imagery and metaphors to describe the power and potential of machine learning. 

* **Data as an ocean:**  Represents the vast amount of information available for learning.
* **Complex patterns:**  Highlights the intricate nature of data and the challenges of extracting meaningful insights.
* **Future prediction:**  Emphasizes the ability of machine learning to analyze data and make predictions about the future.
* **Neural network as a tree:**  Represents the interconnectedness and complexity of the learning process.
* **Learning from data:**  Focuses on the core principle of machine learning, where algorithms learn from data to improve their performance.



The poem concludes by highlighting the diverse applications of machine learning, such as image and speech recognition, and emphasizes its potential to shape the future of technology.

Running the model on a single / multi GPU

# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-jpn-it")
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-2b-jpn-it",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

messages = [
    {"role": "user", "content": "マシーンラーニングについての詩を書いてください。"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True, return_dict=True).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256)
generated_text = tokenizer.batch_decode(outputs[:, inputs['input_ids'].shape[1]:], skip_special_tokens=True)[0]
print(generated_text.strip())

Running the model on a GPU using different precisions

The native weights of this model were exported in bfloat16 precision.

You can also use float32 if you skip the dtype, but no precision increase will occur (model weights will just be upcasted to float32). See examples below.

Upcasting to torch.float32

# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-jpn-it")
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-2b-jpn-it",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "マシーンラーニングについての詩を書いてください。"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True, return_dict=True).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256)
generated_text = tokenizer.batch_decode(outputs[:, inputs['input_ids'].shape[1]:], skip_special_tokens=True)[0]
print(generated_text.strip())

Inference Clients/UIs

From original readme

Usage

Running with the pipeline API

Running the model on a single / multi GPU

Running the model on a GPU using different precisions

Running with the `pipeline` API