RoboApocalypse
commited on
Commit
•
aa13ef7
1
Parent(s):
c37485e
Update README.md
Browse files
README.md
CHANGED
@@ -5,21 +5,21 @@ datasets:
|
|
5 |
- wikimedia/wikipedia
|
6 |
library_name: transformers
|
7 |
---
|
8 |
-
# mini-mistral-
|
9 |
|
10 |
-
This repository contains the **mini-mistral-
|
11 |
|
12 |
## Model Details
|
13 |
|
14 |
- **Architecture**: Mistral
|
15 |
-
- **Parameters**:
|
16 |
- **Training Duration**: 1 epoch
|
17 |
- **Training Dataset**: Wikipedia articles and OpenHermes dataset
|
18 |
- **Training Method**: Grokfast-enhanced Transformers
|
19 |
|
20 |
## Purpose
|
21 |
|
22 |
-
The primary goal of this experiment was to observe the impact of the Grokfast algorithm on the training dynamics of a
|
23 |
|
24 |
## Usage
|
25 |
|
@@ -28,8 +28,8 @@ To use this model, you can load it with the `transformers` library from HuggingF
|
|
28 |
```python
|
29 |
from transformers import AutoModel, AutoTokenizer
|
30 |
|
31 |
-
tokenizer = AutoTokenizer.from_pretrained("RoboApocalypse/mini-mistral-
|
32 |
-
model = AutoModel.from_pretrained("RoboApocalypse/mini-mistral-
|
33 |
|
34 |
# Example usage
|
35 |
input_text = "Hello, world!"
|
@@ -59,4 +59,4 @@ This model is licensed under the OpenRAIL License.
|
|
59 |
|
60 |
---
|
61 |
|
62 |
-
Feel free to check out the model and experiment with it [here](https://huggingface.co/RoboApocalypse/mini-mistral-
|
|
|
5 |
- wikimedia/wikipedia
|
6 |
library_name: transformers
|
7 |
---
|
8 |
+
# mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast
|
9 |
|
10 |
+
This repository contains the **mini-mistral-360M** model, a 360 million parameter version of the Mistral architecture, trained for a single epoch. The model was trained on a diverse dataset comprising Wikipedia articles and the OpenHermes dataset. While this model is still in its early stages and not particularly useful as of now, it serves as an experimental showcase of integrating the Grokfast algorithm into the training process.
|
11 |
|
12 |
## Model Details
|
13 |
|
14 |
- **Architecture**: Mistral
|
15 |
+
- **Parameters**: 360 million
|
16 |
- **Training Duration**: 1 epoch
|
17 |
- **Training Dataset**: Wikipedia articles and OpenHermes dataset
|
18 |
- **Training Method**: Grokfast-enhanced Transformers
|
19 |
|
20 |
## Purpose
|
21 |
|
22 |
+
The primary goal of this experiment was to observe the impact of the Grokfast algorithm on the training dynamics of a 360M parameter Mistral model. During training, it was noted that the evaluation loss followed the training loss closely, which is an intriguing behavior warranting further investigation.
|
23 |
|
24 |
## Usage
|
25 |
|
|
|
28 |
```python
|
29 |
from transformers import AutoModel, AutoTokenizer
|
30 |
|
31 |
+
tokenizer = AutoTokenizer.from_pretrained("RoboApocalypse/mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast")
|
32 |
+
model = AutoModel.from_pretrained("RoboApocalypse/mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast")
|
33 |
|
34 |
# Example usage
|
35 |
input_text = "Hello, world!"
|
|
|
59 |
|
60 |
---
|
61 |
|
62 |
+
Feel free to check out the model and experiment with it [here](https://huggingface.co/RoboApocalypse/mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast). Your feedback and insights are welcome as I try and figure out wtf I'm doing.
|