File size: 1,370 Bytes
4209f20
 
 
2b57bed
 
 
 
ac43173
2b57bed
 
7f93ea6
2b57bed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
license: apache-2.0
---

# Model Card for Soniox-7B-v1.0

Soniox 7B is a powerful large language model. Supports English and code with 8K context.
Matches GPT-4 performance on some benchmarks.
Built on top of Mistral 7B, enhanced with additional pre-training and fine-tuning for strong problem-solving capabilities.
Apache 2.0 License.
For more details, please read our [blog post](https://soniox.com/news/soniox-7B).

## Usage in Transformers

The model is available in transformers and can be used as follows:

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "soniox/Soniox-7B-v1.0"
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(model_path)

device = "cuda"
model.to(device)

messages = [
    {"role": "user", "content": "12 plus 21?"},
    {"role": "assistant", "content": "33."},
    {"role": "user", "content": "Five minus one?"},
]
tok_prompt = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = tok_prompt.to(device)
generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
```

## Inference deployment

Refer to our [documentation](https://docs.soniox.com) for inference with vLLM and other
deployment options.