File size: 3,176 Bytes
086c56f
 
d755b32
2e5ee13
d755b32
 
2e5ee13
 
 
 
 
 
 
 
 
 
 
 
 
086c56f
d755b32
 
 
 
 
 
 
 
 
 
 
 
 
40465eb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d755b32
 
 
 
 
 
 
 
 
9b18697
 
 
 
 
 
 
d755b32
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
license: mit
language: de
pipeline_tag: text-generation
widget:
- text: "In einer schockierenden Entdeckung fanden Wissenschaftler eine Herde Einhörner, die in "
  example_title: "Einhörner ..."
- text: |-
   Definiere folgende Wörter
   
   Wort: Einhorn
   Definition: Das Einhorn ist ein Fabelwesen von Pferde- oder Ziegengestalt mit einem geraden Horn auf der Stirnmitte. 
   
   Wort: Regierungschef
   Definition: Der Regierungschef ist der Leiter der Regierung eines Staates (z. B. National- oder Gliedstaat).
   
   Wort: Waffendrill
   Definition:
  example_title: "Definiere ..."
---

# German GPT2-XL (1.5B)

- trained with [BigScience's DeepSpeed-Megatron-LM code base](https://github.com/bigscience-workshop/Megatron-DeepSpeed)
- word embedding initialized with [WECHSEL](https://arxiv.org/abs/2112.06598) and all other weights taken from English [gpt2-xl](https://huggingface.co/gpt2-xl)
- ~ 3 days on 16xA100 GPUs (~ 80 TFLOPs / GPU)
- stopped after 100k steps
- 26.2B tokens
- less than a single epoch on `oscar_unshuffled_deduplicated_de` (excluding validation set; original model was trained for 75 epochs on less data)
- bf16
- zero stage 0
- tp/pp = 1


### How to use

You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we
set a seed for reproducibility:

```python
>>> from transformers import pipeline, set_seed
>>> generator = pipeline('text-generation', model='malteos/gpt2-xl-wechsel-german')
>>> set_seed(42)
>>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)

[{'generated_text': "Hello, I'm a language model, a language for thinking, a language for expressing thoughts."},
 {'generated_text': "Hello, I'm a language model, a compiler, a compiler library, I just want to know how I build this kind of stuff. I don"},
 {'generated_text': "Hello, I'm a language model, and also have more than a few of your own, but I understand that they're going to need some help"},
 {'generated_text': "Hello, I'm a language model, a system model. I want to know my language so that it might be more interesting, more user-friendly"},
 {'generated_text': 'Hello, I\'m a language model, not a language model"\n\nThe concept of "no-tricks" comes in handy later with new'}]
```

Here is how to use this model to get the features of a given text in PyTorch:

```python
from transformers import GPT2Tokenizer, GPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('malteos/gpt2-xl-wechsel-german')
model = GPT2Model.from_pretrained('malteos/gpt2-xl-wechsel-german')
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
```

## Evaluation

| Model (size) | PPL |
|---|---|
| `gpt2-xl-wechsel-german` (1.5B) | **14.5** |
| `gpt2-wechsel-german-ds-meg` (117M) | 26.4 |
| `gpt2-wechsel-german`  (117M) | 26.8 |
| `gpt2` (retrained from scratch) (117M)  | 27.63 |


## Other German language models

- https://huggingface.co/malteos/bloom-1b5-clp-german
- https://huggingface.co/malteos/bloom-6b4-clp-german
- https://huggingface.co/malteos/bloom-6b4-clp-german-oasst-v0.1

## License

MIT