The learning rate was not displayed as it should. (#3)
Browse files- The learning rate was not displayed as it should. (0e69ea820f60a8bf7cf7f8b69c31df8d09e2e60c)
Co-authored-by: kapllan <[email protected]>
README.md
CHANGED
@@ -101,7 +101,7 @@ For further details see [Niklaus et al. 2023](https://arxiv.org/abs/2306.02069?u
|
|
101 |
- batche size: 512 samples
|
102 |
- Number of steps: 1M/500K for the base/large model
|
103 |
- Warm-up steps for the first 5\% of the total training steps
|
104 |
-
- Learning rate: (linearly increasing up to)
|
105 |
- Word masking: increased 20/30\% masking rate for base/large models respectively
|
106 |
|
107 |
## Evaluation
|
|
|
101 |
- batche size: 512 samples
|
102 |
- Number of steps: 1M/500K for the base/large model
|
103 |
- Warm-up steps for the first 5\% of the total training steps
|
104 |
+
- Learning rate: (linearly increasing up to) 1e-4
|
105 |
- Word masking: increased 20/30\% masking rate for base/large models respectively
|
106 |
|
107 |
## Evaluation
|