Update README.md
Browse files
README.md
CHANGED
@@ -14,7 +14,7 @@ library_name: transformers
|
|
14 |
license: cc-by-4.0
|
15 |
|
16 |
|
17 |
-
#
|
18 |
|
19 |
<style>
|
20 |
img {
|
@@ -27,14 +27,7 @@ img {
|
|
27 |
|
28 |
## Model Description
|
29 |
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
# GPT-J 6B
|
34 |
-
|
35 |
-
## Model Description
|
36 |
-
|
37 |
-
GPT-J 6B is a transformer model trained using Ben Wang's [Mesh Transformer JAX](https://github.com/kingoflolz/mesh-transformer-jax/). "GPT-J" refers to the class of model, while "6B" represents the number of trainable parameters.
|
38 |
|
39 |
<figure>
|
40 |
|
@@ -60,15 +53,11 @@ GPT-2/GPT-3.
|
|
60 |
|
61 |
## Training data
|
62 |
|
63 |
-
|
64 |
-
|
65 |
-
## Training procedure
|
66 |
-
|
67 |
-
This model was trained for 402 billion tokens over 383,500 steps on TPU v3-256 pod. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
|
68 |
|
69 |
## Intended Use and Limitations
|
70 |
|
71 |
-
|
72 |
|
73 |
### How to use
|
74 |
|
@@ -77,8 +66,8 @@ This model can be easily loaded using the `AutoModelForCausalLM` functionality:
|
|
77 |
```python
|
78 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
79 |
|
80 |
-
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/
|
81 |
-
model = AutoModelForCausalLM.from_pretrained("EleutherAI/
|
82 |
```
|
83 |
|
84 |
### Limitations and Biases
|
|
|
14 |
license: cc-by-4.0
|
15 |
|
16 |
|
17 |
+
# Palmyra-small
|
18 |
|
19 |
<style>
|
20 |
img {
|
|
|
27 |
|
28 |
## Model Description
|
29 |
|
30 |
+
Palmyra-small 128M is a transformer-based language model. GPT refers to a class of transformer decoder-only models similar to GPT-2 and 3 while. It has Tensor Parallelism (TP) of 1, Pipeline Parallelism (PP) of 1 and should fit on a single NVIDIA GPU.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
|
32 |
<figure>
|
33 |
|
|
|
53 |
|
54 |
## Training data
|
55 |
|
56 |
+
Palmyra-small 128M was trained on
|
|
|
|
|
|
|
|
|
57 |
|
58 |
## Intended Use and Limitations
|
59 |
|
60 |
+
Palmyra-small learns an inner representation of the English language that can be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating text from a prompt.
|
61 |
|
62 |
### How to use
|
63 |
|
|
|
66 |
```python
|
67 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
68 |
|
69 |
+
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/palmyra-small")
|
70 |
+
model = AutoModelForCausalLM.from_pretrained("EleutherAI/palmyra-small")
|
71 |
```
|
72 |
|
73 |
### Limitations and Biases
|