Update README.md
Browse files
README.md
CHANGED
@@ -58,22 +58,22 @@ Import the tokenizer and model:
|
|
58 |
|
59 |
```python
|
60 |
tokenizer = tokun.huggingface.ByteTokenizer()
|
61 |
-
model = hh.from_pretrained_keras('tokun/variants/
|
62 |
```
|
63 |
|
64 |
### With Base Tensorflow / Keras
|
65 |
|
66 |
You can directly load the weights [from the repository](../models/).
|
67 |
|
68 |
-
For the most performant variant of the model, `
|
69 |
|
70 |
```python
|
71 |
import tensorflow as tf
|
72 |
import tokun.model
|
73 |
import urllib.request
|
74 |
|
75 |
-
urllib.request.urlretrieve('https://github.com/apehex/tokun/raw/main/models/
|
76 |
-
model = tf.keras.models.load_model('model.keras')
|
77 |
```
|
78 |
|
79 |
## Usage
|
@@ -121,7 +121,7 @@ print(__p.shape) # back to x shape
|
|
121 |
### With Base Tensorflow / Keras
|
122 |
|
123 |
```python
|
124 |
-
__x = tokun.pipeline.preprocess(text=__s, groups=[
|
125 |
__e = model._encoder(__x) # final embedding = input for another model
|
126 |
# these embeddings would be the input of a LLM
|
127 |
__o = llm(__e) # replace with your LLM
|
@@ -178,10 +178,6 @@ Notes on each iteration:
|
|
178 |
- `tokun-4`: [Github][article-file-tokun-4]
|
179 |
- `tokun-16`: [Github][article-file-tokun-16]
|
180 |
|
181 |
-
## TODO
|
182 |
-
|
183 |
-
See [TODO](TODO.md).
|
184 |
-
|
185 |
## Credits
|
186 |
|
187 |
This project was inspired by a video from Andrej Karpathy, ["Let's build the GPT tokenizer"][youtube-karpathy-tokenizer].
|
|
|
58 |
|
59 |
```python
|
60 |
tokenizer = tokun.huggingface.ByteTokenizer()
|
61 |
+
model = hh.from_pretrained_keras('tokun/variants/16x4/')
|
62 |
```
|
63 |
|
64 |
### With Base Tensorflow / Keras
|
65 |
|
66 |
You can directly load the weights [from the repository](../models/).
|
67 |
|
68 |
+
For the most performant variant of the model, `16x4`:
|
69 |
|
70 |
```python
|
71 |
import tensorflow as tf
|
72 |
import tokun.model
|
73 |
import urllib.request
|
74 |
|
75 |
+
urllib.request.urlretrieve('https://github.com/apehex/tokun/raw/main/models/16x4/1/7.7.keras', 'model.keras')
|
76 |
+
model = tf.keras.models.load_model('model.keras', compile=False)
|
77 |
```
|
78 |
|
79 |
## Usage
|
|
|
121 |
### With Base Tensorflow / Keras
|
122 |
|
123 |
```python
|
124 |
+
__x = tokun.pipeline.preprocess(text=__s, groups=[16, 4], expand=[1], flatten=True)
|
125 |
__e = model._encoder(__x) # final embedding = input for another model
|
126 |
# these embeddings would be the input of a LLM
|
127 |
__o = llm(__e) # replace with your LLM
|
|
|
178 |
- `tokun-4`: [Github][article-file-tokun-4]
|
179 |
- `tokun-16`: [Github][article-file-tokun-16]
|
180 |
|
|
|
|
|
|
|
|
|
181 |
## Credits
|
182 |
|
183 |
This project was inspired by a video from Andrej Karpathy, ["Let's build the GPT tokenizer"][youtube-karpathy-tokenizer].
|