adamelliotfields
commited on
Commit
•
9c4b3dc
1
Parent(s):
0dc08e6
Update README
Browse files
README.md
CHANGED
@@ -1,44 +1,68 @@
|
|
1 |
---
|
2 |
library_name: keras
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
4 |
|
5 |
## Model description
|
6 |
|
7 |
-
|
8 |
|
9 |
## Intended uses & limitations
|
10 |
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
|
13 |
## Training and evaluation data
|
14 |
|
15 |
-
|
16 |
|
17 |
## Training procedure
|
18 |
|
19 |
-
|
20 |
|
21 |
-
The
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
| global_clipnorm | None |
|
29 |
-
| clipvalue | None |
|
30 |
-
| use_ema | False |
|
31 |
-
| ema_momentum | 0.99 |
|
32 |
-
| ema_overwrite_frequency | None |
|
33 |
-
| jit_compile | False |
|
34 |
-
| is_legacy_optimizer | False |
|
35 |
-
| learning_rate | 0.0002500000118743628 |
|
36 |
-
| beta_1 | 0.9 |
|
37 |
-
| beta_2 | 0.999 |
|
38 |
-
| epsilon | 1e-07 |
|
39 |
-
| amsgrad | False |
|
40 |
-
| training_precision | float32 |
|
41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
|
43 |
## Model Plot
|
44 |
|
|
|
1 |
---
|
2 |
library_name: keras
|
3 |
+
license: mit
|
4 |
+
datasets:
|
5 |
+
- karpathy/tiny_shakespeare
|
6 |
+
metrics:
|
7 |
+
- accuracy
|
8 |
+
pipeline_tag: text-generation
|
9 |
+
tags:
|
10 |
+
- lstm
|
11 |
---
|
12 |
|
13 |
## Model description
|
14 |
|
15 |
+
LSTM trained on Andrej Karpathy's [`tiny_shakespeare`](https://huggingface.co/datasets/karpathy/tiny_shakespeare) dataset, from his blog post, [The Unreasonable Effectiveness of Recurrent Neural Networks](https://karpathy.github.io/2015/05/21/rnn-effectiveness/).
|
16 |
|
17 |
## Intended uses & limitations
|
18 |
|
19 |
+
The model predicts the next character based on a variable-length input sequence. After `18` epochs of training, the model is generating text that is somewhat coherent.
|
20 |
+
|
21 |
+
```py
|
22 |
+
def generate_text(model, encoder, text, n):
|
23 |
+
vocab = encoder.get_vocabulary()
|
24 |
+
generated_text = text
|
25 |
+
for _ in range(n):
|
26 |
+
encoded = encoder([generated_text])
|
27 |
+
pred = model.predict(encoded, verbose=0)
|
28 |
+
pred = tf.squeeze(tf.argmax(pred, axis=-1)).numpy()
|
29 |
+
generated_text += vocab[pred]
|
30 |
+
return generated_text
|
31 |
+
|
32 |
+
sample = "M"
|
33 |
+
print(generate_text(model, encoder, sample, 100))
|
34 |
+
```
|
35 |
+
|
36 |
+
```
|
37 |
+
MQLUS:
|
38 |
+
I will be so that the street of the state,
|
39 |
+
And then the street of the street of the state,
|
40 |
+
And
|
41 |
+
```
|
42 |
|
43 |
## Training and evaluation data
|
44 |
|
45 |
+
[![https://example.com](https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg)](https://wandb.ai/adamelliotfields/shakespeare)
|
46 |
|
47 |
## Training procedure
|
48 |
|
49 |
+
The dataset consists of various works of William Shakespeare concatenated into a single file. The resulting file consists of individual speeches separated by `\n\n`.
|
50 |
|
51 |
+
The tokenizer is a Keras `TextVectorization` preprocessor that uses a simple character-based vocabulary.
|
52 |
+
|
53 |
+
To construct the training set, `100` characters are taken with the next character used as the target. This is repeated for each character in the text and results in **1,115,294** shuffled training examples.
|
54 |
+
|
55 |
+
*TODO: upload encoder*
|
56 |
+
|
57 |
+
### Training hyperparameters
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
58 |
|
59 |
+
| Hyperparameters | Value |
|
60 |
+
| :---------------- | :-------- |
|
61 |
+
| `epochs` | `18` |
|
62 |
+
| `batch_size` | `1024` |
|
63 |
+
| `optimizer` | `AdamW` |
|
64 |
+
| `weight_decay` | `0.001` |
|
65 |
+
| `learning_rate` | `0.00025` |
|
66 |
|
67 |
## Model Plot
|
68 |
|