Artples commited on
Commit
6753fb1
1 Parent(s): 8446bc8

Upload 3 files

Browse files
Files changed (3) hide show
  1. README(7).md +46 -0
  2. config(1).json +24 -0
  3. generation_config(1).json +6 -0
README(7).md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: text-generation
4
+ language:
5
+ - en
6
+ tags:
7
+ - pretrained
8
+ inference:
9
+ parameters:
10
+ temperature: 0.7
11
+ ---
12
+
13
+ # Model Card for Mistral-7B-v0.1
14
+
15
+ The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters.
16
+ Mistral-7B-v0.1 outperforms Llama 2 13B on all benchmarks we tested.
17
+
18
+ For full details of this model please read our [paper](https://arxiv.org/abs/2310.06825) and [release blog post](https://mistral.ai/news/announcing-mistral-7b/).
19
+
20
+ ## Model Architecture
21
+
22
+ Mistral-7B-v0.1 is a transformer model, with the following architecture choices:
23
+ - Grouped-Query Attention
24
+ - Sliding-Window Attention
25
+ - Byte-fallback BPE tokenizer
26
+
27
+ ## Troubleshooting
28
+
29
+ - If you see the following error:
30
+ ```
31
+ KeyError: 'mistral'
32
+ ```
33
+ - Or:
34
+ ```
35
+ NotImplementedError: Cannot copy out of meta tensor; no data!
36
+ ```
37
+
38
+ Ensure you are utilizing a stable version of Transformers, 4.34.0 or newer.
39
+
40
+ ## Notice
41
+
42
+ Mistral 7B is a pretrained base model and therefore does not have any moderation mechanisms.
43
+
44
+ ## The Mistral AI Team
45
+
46
+ Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
config(1).json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MistralForCausalLM"
4
+ ],
5
+ "bos_token_id": 1,
6
+ "eos_token_id": 2,
7
+ "hidden_act": "silu",
8
+ "hidden_size": 4096,
9
+ "initializer_range": 0.02,
10
+ "intermediate_size": 14336,
11
+ "max_position_embeddings": 32768,
12
+ "model_type": "mistral",
13
+ "num_attention_heads": 32,
14
+ "num_hidden_layers": 32,
15
+ "num_key_value_heads": 8,
16
+ "rms_norm_eps": 1e-05,
17
+ "rope_theta": 10000.0,
18
+ "sliding_window": 4096,
19
+ "tie_word_embeddings": false,
20
+ "torch_dtype": "bfloat16",
21
+ "transformers_version": "4.34.0.dev0",
22
+ "use_cache": true,
23
+ "vocab_size": 32000
24
+ }
generation_config(1).json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.34.0.dev0"
6
+ }