EpistemeAI2
/

Fireball-Mistral-Nemo-12B-Philos

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

legolasyiu commited on Sep 4

Commit

a2905ab

•

1 Parent(s): 20b20bc

Update README.md

Files changed (1) hide show

README.md +31 -0

README.md CHANGED Viewed

@@ -11,6 +11,37 @@ tags:
 - trl
 ---
 # Uploaded  model
 - **Developed by:** EpistemeAI

 - trl
 ---
+# Fireball-Mistral-Nemo-12B-Philos
+Supervised Fined tuned by dataset of philosophy, math, coding and languages.
+# Original Model Card
+# Model Card for Mistral-Nemo-Instruct-2407
+The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407). Trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size.
+For more details about this model please refer to our release [blog post](https://mistral.ai/news/mistral-nemo/).
+## Key features
+- Released under the **Apache 2 License**
+- Pre-trained and instructed versions
+- Trained with a **128k context window**
+- Trained on a large proportion of **multilingual and code data**
+- Drop-in replacement of Mistral 7B
+## Model Architecture
+Mistral Nemo is a transformer model, with the following architecture choices:
+- **Layers:** 40
+- **Dim:** 5,120
+- **Head dim:** 128
+- **Hidden dim:** 14,336
+- **Activation Function:** SwiGLU
+- **Number of heads:** 32
+- **Number of kv-heads:** 8 (GQA)
+- **Vocabulary size:** 2**17 ~= 128k
+- **Rotary embeddings (theta = 1M)**
 # Uploaded  model
 - **Developed by:** EpistemeAI