Update README.md
Browse files
README.md
CHANGED
@@ -9,15 +9,14 @@ Special thanks to https://huggingface.co/fahadh4ilyas
|
|
9 |
convert_v2.py
|
10 |
```
|
11 |
|
12 |
-
Training Notes:
|
|
|
|
|
13 |
```
|
14 |
-
# 1. dbrx trains like a much smaller model (~7B)
|
15 |
# start with this as reference point and move up or down based on eval/train loss
|
16 |
learning_rate = 1.5e-5
|
17 |
-
|
18 |
-
# 2. due to BPE (tiktoken) nature, tokenizer expansion/resize is not very friendly to training
|
19 |
-
# use text based special tokens if you need/use extra tokens to avoid bad train/eval losses
|
20 |
```
|
|
|
21 |
|
22 |
Known Issues:
|
23 |
|
|
|
9 |
convert_v2.py
|
10 |
```
|
11 |
|
12 |
+
Training Notes/Observations:
|
13 |
+
|
14 |
+
1. dbrx trains like a much smaller model (~7B)
|
15 |
```
|
|
|
16 |
# start with this as reference point and move up or down based on eval/train loss
|
17 |
learning_rate = 1.5e-5
|
|
|
|
|
|
|
18 |
```
|
19 |
+
2. Due to nature of BPE (tiktoken), tokenizer expansion/resize is not very friendly to training. Use text based special tokens if you need/use extra tokens to avoid bad train/eval losses
|
20 |
|
21 |
Known Issues:
|
22 |
|