Upload Stories260K model
Browse filesAdds the Stories260K model. A tiny model intended for unit tests and such.
Includes files needed for both inference in Python and for inference in run.c
See readme.md for more details
- stories260K/readme.md +54 -0
- stories260K/stories260K.bin +3 -0
- stories260K/stories260K.pt +3 -0
- stories260K/tok512.bin +3 -0
- stories260K/tok512.model +3 -0
stories260K/readme.md
ADDED
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
This 260K model is a tiny model and it was trained as follows:
|
2 |
+
|
3 |
+
```
|
4 |
+
python train.py \
|
5 |
+
--out_dir="outmini" \
|
6 |
+
--batch_size=128 \
|
7 |
+
--max_seq_len=512 \
|
8 |
+
--gradient_accumulation_steps=1 \
|
9 |
+
--vocab_source="custom" \
|
10 |
+
--vocab_size=512 \
|
11 |
+
--dim=64 \
|
12 |
+
--n_layers=5 \
|
13 |
+
--n_heads=8 \
|
14 |
+
--n_kv_heads=4 \
|
15 |
+
--multiple_of=4 \
|
16 |
+
--learning_rate=1e-3 \
|
17 |
+
--dropout=0.05 \
|
18 |
+
--weight_decay=0.01 \
|
19 |
+
--max_iters=100000 \
|
20 |
+
--beta2=0.99 \
|
21 |
+
--warmup_iters=1000 \
|
22 |
+
--eval_interval=2000 \
|
23 |
+
--eval_iters=100 \
|
24 |
+
--compile=True
|
25 |
+
```
|
26 |
+
|
27 |
+
You'll notice that `n_kv_heads` is 4 while `n_heads` is 8, so two heads at a time share their key,value projections, i.e. this model is 2X multiquery. You'll also notice that we're using a custom tokenizer with 512 tokens. The model trained for ~10 minutes (?) on my A100 and achieves validation loss of 1.2968.
|
28 |
+
|
29 |
+
Sampling this model at temperature 0.0 (i.e. deterministic greedy argmax sampling) gives:
|
30 |
+
|
31 |
+
```
|
32 |
+
$ ./run stories260K/stories260K.bin -z stories260K/tok512.bin -t 0.0
|
33 |
+
Once upon a time, there was a little girl named Lily. She loved to play outside in the park. One day, she saw a big, red ball. She wanted to play with it, but it was too high.
|
34 |
+
Lily's mom said, "Lily, let's go to the park." Lily was sad and didn't know what to do. She said, "I want to play with your ball, but I can't find it."
|
35 |
+
Lily was sad and didn't know what to do. She said, "I'm sorry, Lily. I didn't know what to do."
|
36 |
+
Lily didn't want to help her mom, so she said, "I'm sorry, mom. I didn't know what to do." Her mom said, "Don't worry, Lily. We can help you.
|
37 |
+
```
|
38 |
+
|
39 |
+
You can reproduce the same in Python by running `sample.py`:
|
40 |
+
|
41 |
+
```
|
42 |
+
$ python sample.py --checkpoint=stories260K/stories260K.pt --tokenizer=stories260K/tok512.model --temperature=0.0 --max_new_tokens=257
|
43 |
+
```
|
44 |
+
|
45 |
+
I hardcoded max tokens to be 257 manually because the `sample.py` script doesn't currently terminate on the special BOS token like the run.c script does. Sampling at 1.0 with topp of 0.9 gives a bit more reasonable samples:
|
46 |
+
|
47 |
+
```
|
48 |
+
$ ./run stories260K/stories260K.bin -z stories260K/tok512.bin -t 1.0 -p 0.9 -s 133742
|
49 |
+
Once upon a time, there was a little boy named Timmy. Timmy loved to play with his toys and eat sandwiches. One day, Timmy's mom told him it was time to rest for a while. Timmy's friend Billy came over and took him a down.
|
50 |
+
Timmy's mom saw that Timmy was sad, but Timmy said, "I didn't understand what is it! We need to find some leafs." Timmy thought about it and took a deep breath on a spoon. He hoped it was important to be kind and continued to find its image next time.
|
51 |
+
After they finished getting, Timmy's dad came up to his house and promised to help Timmy.
|
52 |
+
```
|
53 |
+
|
54 |
+
Hey you can't expect too much from a 260K parameter model. I'm even mildly shocked we get this far :D
|
stories260K/stories260K.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b0a507e7ad0f626624f17112325e66691f9076d622e1d3274d103d00299f2696
|
3 |
+
size 1056540
|
stories260K/stories260K.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:eec953f9d0f139e894ef8996302680e64b24813c7a98425424f5c85f7cf4abb1
|
3 |
+
size 1061090
|
stories260K/tok512.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e6e45b754b603ab1fb1a31e59c1ebbee92a789504c3ddf6debb6bd3c106222d6
|
3 |
+
size 5075
|
stories260K/tok512.model
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:dfff07d929db979913f166ec94a6f5ecad4c70cfed8eb5c9cbe7e464455e46f5
|
3 |
+
size 7645
|