RoboApocalypse
/

mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast

@@ -5,21 +5,21 @@ datasets:
 - wikimedia/wikipedia
 library_name: transformers
 ---
-# mini-mistral-327M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast
-This repository contains the **mini-mistral-327M** model, a 327 million parameter version of the Mistral architecture, trained for a single epoch. The model was trained on a diverse dataset comprising Wikipedia articles and the OpenHermes dataset. While this model is still in its early stages and not particularly useful as of now, it serves as an experimental showcase of integrating the Grokfast algorithm into the training process.
 ## Model Details
 - **Architecture**: Mistral
-- **Parameters**: 327 million
 - **Training Duration**: 1 epoch
 - **Training Dataset**: Wikipedia articles and OpenHermes dataset
 - **Training Method**: Grokfast-enhanced Transformers
 ## Purpose
-The primary goal of this experiment was to observe the impact of the Grokfast algorithm on the training dynamics of a 327M parameter Mistral model. During training, it was noted that the evaluation loss followed the training loss closely, which is an intriguing behavior warranting further investigation.
 ## Usage
@@ -28,8 +28,8 @@ To use this model, you can load it with the `transformers` library from HuggingF
 ```python
 from transformers import AutoModel, AutoTokenizer
-tokenizer = AutoTokenizer.from_pretrained("RoboApocalypse/mini-mistral-327M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast")
-model = AutoModel.from_pretrained("RoboApocalypse/mini-mistral-327M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast")
 # Example usage
 input_text = "Hello, world!"
@@ -59,4 +59,4 @@ This model is licensed under the OpenRAIL License.
 ---
-Feel free to check out the model and experiment with it [here](https://huggingface.co/RoboApocalypse/mini-mistral-327M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast). Your feedback and insights are welcome as I try and figure out wtf I'm doing.

 - wikimedia/wikipedia
 library_name: transformers
 ---
+# mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast
+This repository contains the **mini-mistral-360M** model, a 360 million parameter version of the Mistral architecture, trained for a single epoch. The model was trained on a diverse dataset comprising Wikipedia articles and the OpenHermes dataset. While this model is still in its early stages and not particularly useful as of now, it serves as an experimental showcase of integrating the Grokfast algorithm into the training process.
 ## Model Details
 - **Architecture**: Mistral
+- **Parameters**: 360 million
 - **Training Duration**: 1 epoch
 - **Training Dataset**: Wikipedia articles and OpenHermes dataset
 - **Training Method**: Grokfast-enhanced Transformers
 ## Purpose
+The primary goal of this experiment was to observe the impact of the Grokfast algorithm on the training dynamics of a 360M parameter Mistral model. During training, it was noted that the evaluation loss followed the training loss closely, which is an intriguing behavior warranting further investigation.
 ## Usage
 ```python
 from transformers import AutoModel, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("RoboApocalypse/mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast")
+model = AutoModel.from_pretrained("RoboApocalypse/mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast")
 # Example usage
 input_text = "Hello, world!"
 ---
+Feel free to check out the model and experiment with it [here](https://huggingface.co/RoboApocalypse/mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast). Your feedback and insights are welcome as I try and figure out wtf I'm doing.