DarwinAnim8or
commited on
Commit
•
2692696
1
Parent(s):
868208d
Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,7 @@ As mentioned, a few updates are planned:
|
|
18 |
* Fine-tuning the resulting model for instruct, code and storywriting. These will then be combined using MergeKit to create a MoE model.
|
19 |
* Release a GGUF version and an extended context version of the base model
|
20 |
|
21 |
-
|
22 |
|
23 |
This table tracks the performance of our model on various tasks over time.
|
24 |
|
@@ -26,25 +26,16 @@ This table tracks the performance of our model on various tasks over time.
|
|
26 |
|-------------------|----------|---------------|---------------|---------------|---------------| ---- |
|
27 |
| 2024-07-27 | acc | 27.40% ± 0.92% | 25.52% ± 0.44% | 52.71% ± 3.01% | 39.52% ± 1.11% | 36.29% |
|
28 |
|
29 |
-
|
30 |
- Date: The date of each evaluation run
|
31 |
-
- Metric: The evaluation metric used (acc = accuracy
|
32 |
- Task columns: Results for each task in the format "Percentage ± Standard Error"
|
33 |
|
34 |
-
|
35 |
- All accuracy values are presented as percentages
|
36 |
- Empty cells indicate that the task was not evaluated on that date or for that metric
|
37 |
- Standard errors are also converted to percentages for consistency
|
38 |
|
39 |
-
### Legend
|
40 |
-
- Task: The name of the evaluation task
|
41 |
-
- Metric: The evaluation metric used (acc = accuracy, acc_norm = normalized accuracy)
|
42 |
-
- Date columns: The date of each evaluation run, with results in the format "Value ± Standard Error"
|
43 |
-
|
44 |
-
### Notes
|
45 |
-
- All accuracy values are on a scale from 0 to 1
|
46 |
-
- Empty cells indicate that the task was not evaluated on that date
|
47 |
-
|
48 |
# Tokenizer
|
49 |
Our tokenizer was trained from scratch on 500,000 samples from the Openwebtext dataset. Like Mistral, we use the LlamaTokenizerFast as our tokenizer class; in legacy mode.
|
50 |
|
|
|
18 |
* Fine-tuning the resulting model for instruct, code and storywriting. These will then be combined using MergeKit to create a MoE model.
|
19 |
* Release a GGUF version and an extended context version of the base model
|
20 |
|
21 |
+
# Model Performance Tracking
|
22 |
|
23 |
This table tracks the performance of our model on various tasks over time.
|
24 |
|
|
|
26 |
|-------------------|----------|---------------|---------------|---------------|---------------| ---- |
|
27 |
| 2024-07-27 | acc | 27.40% ± 0.92% | 25.52% ± 0.44% | 52.71% ± 3.01% | 39.52% ± 1.11% | 36.29% |
|
28 |
|
29 |
+
## Legend
|
30 |
- Date: The date of each evaluation run
|
31 |
+
- Metric: The evaluation metric used (acc = accuracy)
|
32 |
- Task columns: Results for each task in the format "Percentage ± Standard Error"
|
33 |
|
34 |
+
## Notes
|
35 |
- All accuracy values are presented as percentages
|
36 |
- Empty cells indicate that the task was not evaluated on that date or for that metric
|
37 |
- Standard errors are also converted to percentages for consistency
|
38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
# Tokenizer
|
40 |
Our tokenizer was trained from scratch on 500,000 samples from the Openwebtext dataset. Like Mistral, we use the LlamaTokenizerFast as our tokenizer class; in legacy mode.
|
41 |
|