apehex commited on
Commit
b66338b
1 Parent(s): 583c638

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -9
README.md CHANGED
@@ -58,22 +58,22 @@ Import the tokenizer and model:
58
 
59
  ```python
60
  tokenizer = tokun.huggingface.ByteTokenizer()
61
- model = hh.from_pretrained_keras('tokun/variants/4x16/')
62
  ```
63
 
64
  ### With Base Tensorflow / Keras
65
 
66
  You can directly load the weights [from the repository](../models/).
67
 
68
- For the most performant variant of the model, `4x16`:
69
 
70
  ```python
71
  import tensorflow as tf
72
  import tokun.model
73
  import urllib.request
74
 
75
- urllib.request.urlretrieve('https://github.com/apehex/tokun/raw/main/models/4x16/1/6.3.keras', 'model.keras')
76
- model = tf.keras.models.load_model('model.keras')
77
  ```
78
 
79
  ## Usage
@@ -121,7 +121,7 @@ print(__p.shape) # back to x shape
121
  ### With Base Tensorflow / Keras
122
 
123
  ```python
124
- __x = tokun.pipeline.preprocess(text=__s, groups=[4, 16], expand=[1], flatten=True)
125
  __e = model._encoder(__x) # final embedding = input for another model
126
  # these embeddings would be the input of a LLM
127
  __o = llm(__e) # replace with your LLM
@@ -178,10 +178,6 @@ Notes on each iteration:
178
  - `tokun-4`: [Github][article-file-tokun-4]
179
  - `tokun-16`: [Github][article-file-tokun-16]
180
 
181
- ## TODO
182
-
183
- See [TODO](TODO.md).
184
-
185
  ## Credits
186
 
187
  This project was inspired by a video from Andrej Karpathy, ["Let's build the GPT tokenizer"][youtube-karpathy-tokenizer].
 
58
 
59
  ```python
60
  tokenizer = tokun.huggingface.ByteTokenizer()
61
+ model = hh.from_pretrained_keras('tokun/variants/16x4/')
62
  ```
63
 
64
  ### With Base Tensorflow / Keras
65
 
66
  You can directly load the weights [from the repository](../models/).
67
 
68
+ For the most performant variant of the model, `16x4`:
69
 
70
  ```python
71
  import tensorflow as tf
72
  import tokun.model
73
  import urllib.request
74
 
75
+ urllib.request.urlretrieve('https://github.com/apehex/tokun/raw/main/models/16x4/1/7.7.keras', 'model.keras')
76
+ model = tf.keras.models.load_model('model.keras', compile=False)
77
  ```
78
 
79
  ## Usage
 
121
  ### With Base Tensorflow / Keras
122
 
123
  ```python
124
+ __x = tokun.pipeline.preprocess(text=__s, groups=[16, 4], expand=[1], flatten=True)
125
  __e = model._encoder(__x) # final embedding = input for another model
126
  # these embeddings would be the input of a LLM
127
  __o = llm(__e) # replace with your LLM
 
178
  - `tokun-4`: [Github][article-file-tokun-4]
179
  - `tokun-16`: [Github][article-file-tokun-16]
180
 
 
 
 
 
181
  ## Credits
182
 
183
  This project was inspired by a video from Andrej Karpathy, ["Let's build the GPT tokenizer"][youtube-karpathy-tokenizer].