Aarifkhan commited on
Commit
b9459fe
•
1 Parent(s): 05ebd1e

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +104 -3
README.md CHANGED
@@ -1,3 +1,104 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ pipeline_tag: text-generation
7
+ ---
8
+
9
+ # BabyMistral Model Card
10
+
11
+ ## Model Overview
12
+
13
+ **BabyMistral** is a compact yet powerful language model designed for efficient text generation tasks. Built on the Mistral architecture, this model offers impressive performance despite its relatively small size.
14
+
15
+ ### Key Specifications
16
+
17
+ - **Parameters:** 1.5 billion
18
+ - **Training Data:** 1.5 trillion tokens
19
+ - **Architecture:** Based on Mistral
20
+ - **Training Duration:** 70 days
21
+ - **Hardware:** 4x NVIDIA A100 GPUs
22
+
23
+ ## Model Details
24
+
25
+ ### Architecture
26
+
27
+ BabyMistral utilizes the Mistral AI architecture, which is known for its efficiency and performance. The model scales this architecture to 1.5 billion parameters, striking a balance between capability and computational efficiency.
28
+
29
+ ### Training
30
+ - **Dataset Size:** 1.5 trillion tokens
31
+ - **Training Approach:** Trained from scratch
32
+ - **Hardware:** 4x NVIDIA A100 GPUs
33
+ - **Duration:** 70 days of continuous training
34
+
35
+ ### Capabilities
36
+
37
+ BabyMistral is designed for a wide range of natural language processing tasks, including:
38
+
39
+ - Text completion and generation
40
+ - Creative writing assistance
41
+ - Dialogue systems
42
+ - Question answering
43
+ - Language understanding tasks
44
+
45
+ ## Usage
46
+
47
+ ### Getting Started
48
+
49
+ To use BabyMistral with the Hugging Face Transformers library:
50
+
51
+ ```python
52
+ import torch
53
+ from transformers import AutoModelForCausalLM, AutoTokenizer
54
+
55
+ model = AutoModelForCausalLM.from_pretrained("Aarifkhan/BabyMistral")
56
+ tokenizer = AutoTokenizer.from_pretrained("Aarifkhan/BabyMistral")
57
+
58
+ # Define the chat input
59
+ chat = [
60
+ # { "role": "system", "content": "You are BabyMistral" },
61
+ { "role": "user", "content": "Hey there! How are you? 😊" }
62
+ ]
63
+
64
+ inputs = tokenizer.apply_chat_template(
65
+ chat,
66
+ add_generation_prompt=True,
67
+ return_tensors="pt"
68
+ ).to(model.device)
69
+
70
+
71
+ # Generate text
72
+ outputs = model.generate(
73
+ inputs,
74
+ max_new_tokens=256,
75
+ do_sample=True,
76
+ temperature=0.6,
77
+ top_p=0.9,
78
+ eos_token_id=tokenizer.eos_token_id,
79
+
80
+
81
+ )
82
+
83
+ response = outputs[0][inputs.shape[-1]:]
84
+ print(tokenizer.decode(response, skip_special_tokens=True))
85
+
86
+ #I am doing well! How can I assist you today? 😊
87
+
88
+ ```
89
+
90
+ ### Ethical Considerations
91
+
92
+ While BabyMistral is a powerful tool, users should be aware of its limitations and potential biases:
93
+
94
+ - The model may reproduce biases present in its training data
95
+ - It should not be used as a sole source of factual information
96
+ - Generated content should be reviewed for accuracy and appropriateness
97
+
98
+
99
+ ### Limitations
100
+
101
+ - May struggle with very specialized or technical domains
102
+ - Lacks real-time knowledge beyond its training data
103
+ - Potential for generating plausible-sounding but incorrect information
104
+