---
license: other
license_name: gemma-terms-of-use
license_link: https://ai.google.dev/gemma/terms
---

## The world's first Gemma fine-tune based on openchat-3.5-0106 data and method (C-RLFT). Almost the same performance as Mistral-based openchat, and much better than Gemma-7b and Gemma-7b-it.

Please refer to [openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) for details.

> P.S.: 6T pre-training tokens + 0.003 init std dev + C-RLFT is the secret sauce?
>
> P.P.S.: @Google team, we know your model is great, but please use an OSI-approved license like Mistral (or even Phi and Orca).

## Benchmarks

| Model                       | # Params | Average  | MT-Bench | HumanEval | BBH MC   | AGIEval  | TruthfulQA | MMLU     | GSM8K    | BBH CoT  |
|-----------------------------|----------|----------|----------|-----------|----------|----------|------------|----------|----------|----------|
| **OpenChat-3.5-0106 Gemma** | **7B**   | 64.4     | 7.83     | 67.7      | **52.7** | **50.2** | 55.4       | 65.7     | **81.5** | 63.7     |
| OpenChat-3.5-0106 Mistral   | **7B**   | **64.5** | 7.8      | **71.3**  | 51.5     | 49.1     | **61.0**   | 65.8     | 77.4     | 62.2     |
| ChatGPT (March)             | ???B     | 61.5     | **7.94** | 48.1      | 47.6     | 47.1     | 57.7       | **67.3** | 74.9     | **70.1** |
|                             |          |          |          |           |          |          |            |          |          |          |
| Gemma-7B                    | 7B       | -        | -        | 32.3      | -        | 41.7     | -          | 64.3     | 46.4     | -        |
| Gemma-7B-it *               | 7B       | 25.4     | -        | 28.0      | 38.4     | 32.5     | 34.1       | 26.5     | 10.8     | 7.6      |
| OpenHermes 2.5              | 7B       | 59.3     | 7.54     | 48.2      | 49.4     | 46.5     | 57.5       | 63.8     | 73.5     | 59.9     |

*: `Gemma-7b-it` failed to understand and follow most few-shot templates.

## Usage

To use this model, we highly recommend installing the OpenChat package by following the [installation guide](https://github.com/imoneoi/openchat#installation) in our repository and using the OpenChat OpenAI-compatible API server by running the serving command from the table below. The server is optimized for high-throughput deployment using [vLLM](https://github.com/vllm-project/vllm) and can run on a consumer GPU with 24GB RAM. To enable tensor parallelism, append `--tensor-parallel-size N` to the serving command.

Once started, the server listens at `localhost:18888` for requests and is compatible with the [OpenAI ChatCompletion API specifications](https://platform.openai.com/docs/api-reference/chat). Please refer to the example request below for reference. Additionally, you can use the [OpenChat Web UI](https://github.com/imoneoi/openchat#web-ui) for a user-friendly experience.

If you want to deploy the server as an online service, you can use `--api-keys sk-KEY1 sk-KEY2 ...` to specify allowed API keys and `--disable-log-requests --disable-log-stats --log-file openchat.log` for logging only to a file. For security purposes, we recommend using an [HTTPS gateway](https://fastapi.tiangolo.com/es/deployment/concepts/#security-https) in front of the server.

| Model                   | Size | Context | Weights                                                                | Serving                                                                                                                |
|-------------------------|------|---------|------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
| OpenChat-3.5-0106-Gemma | 7B   | 8192    | [Huggingface](https://huggingface.co/openchat/openchat-3.5-0106-gemma) | `python -m ochat.serving.openai_api_server --model openchat/openchat-3.5-0106-gemma --engine-use-ray --worker-use-ray` |

<details>
  <summary>Example request (click to expand)</summary>

```bash
curl http://localhost:18888/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openchat_3.5_gemma_new",
    "messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
  }'
```

</details>

## Conversation template

⚠️ **Notice:** This is different from the Mistral version. End-of-turn token is `<end_of_turn>` now (Mistral version is `<|end_of_turn|>`). Remember to set `<end_of_turn>` as end of generation token.

```
GPT4 Correct User: Hello<end_of_turn>GPT4 Correct Assistant: Hi<end_of_turn>GPT4 Correct User: How are you today?<end_of_turn>GPT4 Correct Assistant:
```

With system message (**NOT** recommended, may degrade performance)

```
You are a helpful assistant.<end_of_turn>GPT4 Correct User: Hello<end_of_turn>GPT4 Correct Assistant: Hi<end_of_turn>GPT4 Correct User: How are you today?<end_of_turn>GPT4 Correct Assistant:
```