shakespeare / README.md
adamelliotfields's picture
Update readme
ab5cc48 verified
---
datasets:
- karpathy/tiny_shakespeare
library_name: tf-keras
license: mit
metrics:
- accuracy
pipeline_tag: text-generation
tags:
- lstm
---
## Model description
LSTM trained on Andrej Karpathy's [`tiny_shakespeare`](https://huggingface.co/datasets/karpathy/tiny_shakespeare) dataset, from his blog post, [The Unreasonable Effectiveness of Recurrent Neural Networks](https://karpathy.github.io/2015/05/21/rnn-effectiveness/).
Made to experiment with Hugging Face and W&B.
## Intended uses & limitations
The model predicts the next character based on a variable-length input sequence. After `18` epochs of training, the model is generating text that is somewhat coherent.
```py
def generate_text(model, encoder, text, n):
vocab = encoder.get_vocabulary()
generated_text = text
for _ in range(n):
encoded = encoder([generated_text])
pred = model.predict(encoded, verbose=0)
pred = tf.squeeze(tf.argmax(pred, axis=-1)).numpy()
generated_text += vocab[pred]
return generated_text
sample = "M"
print(generate_text(model, encoder, sample, 100))
```
```
MQLUS:
I will be so that the street of the state,
And then the street of the street of the state,
And
```
## Training and evaluation data
[![https://example.com](https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg)](https://wandb.ai/adamelliotfields/shakespeare)
## Training procedure
The dataset consists of various works of William Shakespeare concatenated into a single file. The resulting file consists of individual speeches separated by `\n\n`.
The tokenizer is a Keras `TextVectorization` preprocessor that uses a simple character-based vocabulary.
To construct the training set, `100` characters are taken with the next character used as the target. This is repeated for each character in the text and results in **1,115,294** shuffled training examples.
*TODO: upload encoder*
### Training hyperparameters
| Hyperparameters | Value |
| :---------------- | :-------- |
| `epochs` | `18` |
| `batch_size` | `1024` |
| `optimizer` | `AdamW` |
| `weight_decay` | `0.001` |
| `learning_rate` | `0.00025` |
## Model Plot
<details>
<summary>View Model Plot</summary>
![Model Image](./model.png)
</details>