Safetensors
PyTorch
English
gpt_neox
causal-lm
pythia
File size: 7,791 Bytes
6ae2998
147bb29
 
 
 
 
 
 
 
 
6ae2998
 
 
 
147bb29
 
6ae2998
4787b01
 
 
 
 
 
 
 
 
6ae2998
 
147bb29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ae2998
147bb29
6ae2998
147bb29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ae2998
 
 
147bb29
6ae2998
147bb29
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
---
language:
- en
tags:
- pytorch
- causal-lm
- pythia
license: apache-2.0
datasets:
- EleutherAI/pile
---

# Model Card for Model ID

The Pythia 160m model is part of a collection of models developed to facilitate 
interpretability research [(see repository)](https://huggingface.co/EleutherAI/pythia-160m/edit/main/README.md) trained on the Pile. We have evalutated it on hellaswag using the Eleuther evaluation harness.

<figure>
  
|  Tasks  |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag|      1|none  |     0|acc     |↑  |0.2872|±  |0.0045|
|         |       |none  |     0|acc_norm|↑  |0.3082|±  |0.0046|
<figcaption>Evaluation results.</figcaption>
</figure>

## Model Details

- Developed by: [EleutherAI](http://eleuther.ai)
- Model type: Transformer-based Language Model
- Language: English
- Learn more: [Pythia's GitHub repository](https://github.com/EleutherAI/pythia)
 for training procedure, config files, and details on how to use.
 [See paper](https://arxiv.org/pdf/2304.01373.pdf) for more evals and implementation
 details.
- Library: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
- License: Apache 2.0
- Contact: to ask questions about this model, join the [EleutherAI 
Discord](https://discord.gg/zBGx3azzUn), and post them in `#release-discussion`.
 Please read the existing *Pythia* documentation before asking about it in the 
 EleutherAI Discord. For general correspondence: [contact@eleuther.
 ai](mailto:[email protected]).

<figure>
  
| Pythia model | Non-Embedding Params | Layers | Model Dim | Heads | Batch Size | Learning Rate         | Equivalent Models      |
| -----------: | -------------------: | :----: | :-------: | :---: | :--------: | :-------------------: | :--------------------: |
| 160M         | 85,056,000           | 12     | 768       | 12    | 2M         | 6.0 x 10<sup>-4</sup> | GPT-Neo 125M, OPT-125M |
<figcaption>Engineering details for the <i>Pythia Suite</i>. Deduped and 
non-deduped models of a given size have the same hyperparameters. “Equivalent” 
models have <b>exactly</b> the same architecture, and the same number of 
non-embedding parameters.</figcaption>
</figure>

### Model Description

This is the model card of Pythia 160m evaluated on the Eleuther evaluation harness.

- **Developed by:** [EleutherAI](http://eleuther.ai)
- **Model type:** Pythia 160m
- **Language(s) (NLP):** EN
- **License:** Apache 2.0

### Model Sources

- **Repository:** https://huggingface.co/EleutherAI/pythia-160m/edit/main/README.md
  
## Uses and Limitations

### Intended Use

The primary intended use of Pythia is research on the behavior, functionality, 
and limitations of large language models. This suite is intended to provide 
a controlled setting for performing scientific experiments. We also provide 
154 checkpoints per model: initial `step0`, 10 log-spaced checkpoints 
`step{1,2,4...512}`, and 143 evenly-spaced checkpoints from `step1000` to 
`step143000`. These checkpoints are hosted on Hugging Face as branches. Note 
that branch `143000` corresponds exactly to the model checkpoint on the `main` 
branch of each model.

You may also further fine-tune and adapt Pythia-160M for deployment, 
as long as your use is in accordance with the Apache 2.0 license. Pythia 
models work with the Hugging Face [Transformers 
Library](https://huggingface.co/docs/transformers/index). If you decide to use 
pre-trained Pythia-160M as a basis for your fine-tuned model, please 
conduct your own risk and bias assessment. 

### Out-of-scope use

The Pythia Suite is **not** intended for deployment. It is not a in itself 
a product and cannot be used for human-facing interactions. For example, 
the model may generate harmful or offensive text. Please evaluate the risks
associated with your particular use case.

Pythia models are English-language only, and are not suitable for translation 
or generating text in other languages.

Pythia-160M has not been fine-tuned for downstream contexts in which 
language models are commonly deployed, such as writing genre prose, 
or commercial chatbots. This means Pythia-160M will **not** 
respond to a given prompt the way a product like ChatGPT does. This is because,
 unlike this model, ChatGPT was fine-tuned using methods such as Reinforcement 
Learning from Human Feedback (RLHF) to better “follow” human instructions.

### Limitations and biases

The core functionality of a large language model is to take a string of text 
and predict the next token. The token used by the model need not produce the 
most “accurate” text. Never rely on Pythia-160M to produce factually accurate 
output.

This model was trained on [the Pile](https://pile.eleuther.ai/), a dataset 
known to contain profanity and texts that are lewd or otherwise offensive. 
See [Section 6 of the Pile paper](https://arxiv.org/abs/2101.00027) for a 
discussion of documented biases with regards to gender, religion, and race. 
Pythia-160M may produce socially unacceptable or undesirable text, *even if* 
the prompt itself does not include anything explicitly offensive. 

If you plan on using text generated through, for example, the Hosted Inference 
API, we recommend having a human curate the outputs of this language model 
before presenting it to other people. Please inform your audience that the 
text was generated by Pythia-160M.

## Training

### Training data

[The Pile](https://pile.eleuther.ai/) is a 825GiB general-purpose dataset in 
English. It was created by EleutherAI specifically for training large language 
models. It contains texts from 22 diverse sources, roughly broken down into 
five categories: academic writing (e.g. arXiv), internet (e.g. CommonCrawl), 
prose (e.g. Project Gutenberg), dialogue (e.g. YouTube subtitles), and 
miscellaneous (e.g. GitHub, Enron Emails). See [the Pile 
paper](https://arxiv.org/abs/2101.00027) for a breakdown of all data sources, 
methodology, and a discussion of ethical implications. Consult [the 
datasheet](https://arxiv.org/abs/2201.07311) for more detailed documentation 
about the Pile and its component datasets. The Pile can be downloaded from 
the [official website](https://pile.eleuther.ai/), or from a [community 
mirror](https://the-eye.eu/public/AI/pile/).<br>
The Pile was **not** deduplicated before being used to train Pythia-160M.

### Training procedure

All models were trained on the exact same data, in the exact same order. Each 
model saw 299,892,736,000 tokens during training, and 143 checkpoints for each 
model are saved every 2,097,152,000 tokens, spaced evenly throughout training, 
from `step1000` to `step143000` (which is the same as `main`). In addition, we 
also provide frequent early checkpoints: `step0` and `step{1,2,4...512}`.
This corresponds to training for just under 1 epoch on the Pile for 
non-deduplicated models, and about 1.5 epochs on the deduplicated Pile.

All *Pythia* models trained for 143000 steps at a batch size 
of 2M (2,097,152 tokens).<br>
See [GitHub](https://github.com/EleutherAI/pythia) for more details on training
 procedure, including [how to reproduce 
 it](https://github.com/EleutherAI/pythia/blob/main/README.md#reproducing-training).<br>
Pythia uses the same tokenizer as [GPT-NeoX-
20B](https://huggingface.co/EleutherAI/gpt-neox-20b).

## Evaluation

This model has been evaluated on hellaswag using the Eleuther evaluation harness.

<figure>
  
|  Tasks  |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag|      1|none  |     0|acc     |↑  |0.2872|±  |0.0045|
|         |       |none  |     0|acc_norm|↑  |0.3082|±  |0.0046|
<figcaption>Evaluation results.</figcaption>
</figure>