File size: 6,672 Bytes
0f6ad95
 
 
eb765a8
2b45530
 
eb765a8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0f6ad95
 
 
 
 
 
 
 
2b45530
0f6ad95
 
 
2b45530
 
84f766d
2b45530
f92b8ee
0f6ad95
2f45b4f
0f6ad95
2f45b4f
 
 
 
 
 
 
 
0f6ad95
 
84f766d
0f6ad95
 
 
 
 
 
 
 
 
 
 
 
 
2f45b4f
 
9778efc
 
 
2f45b4f
2b45530
9778efc
 
2f45b4f
 
2b45530
9778efc
2b45530
2f45b4f
 
 
 
2b45530
ff84928
2b45530
2f45b4f
 
2b45530
ff84928
2b45530
2f45b4f
2b45530
ff84928
2b45530
2f45b4f
 
0f6ad95
 
2b45530
25d261c
2b45530
0f6ad95
 
2b45530
 
25d261c
2b45530
0f6ad95
 
eb765a8
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
---
language:
- en
license: creativeml-openrail-m
datasets:
- Anthropic/hh-rlhf
model-index:
- name: babyllama-v0.6
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Challenge (25-Shot)
      type: ai2_arc
      config: ARC-Challenge
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: acc_norm
      value: 36.09
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kevin009/babyllama-v0.6
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: HellaSwag (10-Shot)
      type: hellaswag
      split: validation
      args:
        num_few_shot: 10
    metrics:
    - type: acc_norm
      value: 61.59
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kevin009/babyllama-v0.6
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU (5-Shot)
      type: cais/mmlu
      config: all
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 25.37
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kevin009/babyllama-v0.6
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: TruthfulQA (0-shot)
      type: truthful_qa
      config: multiple_choice
      split: validation
      args:
        num_few_shot: 0
    metrics:
    - type: mc2
      value: 35.84
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kevin009/babyllama-v0.6
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Winogrande (5-shot)
      type: winogrande
      config: winogrande_xl
      split: validation
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 61.01
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kevin009/babyllama-v0.6
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GSM8k (5-shot)
      type: gsm8k
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 1.59
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kevin009/babyllama-v0.6
      name: Open LLM Leaderboard
---
# Model Card for BabyLlama v0.6

## Overview
**Model Name:** BabyLlama v0.6  
**Repository:** kevin009/babyllama-v0.6  
**Architecture:** LlamaForCausalLM, based on TinyLlama 1.1b  
**Model Type:** llama  
**Version:** 0.5

## Model Description

It uses RLHF and DOP to mimic a playful, human-like, and creative conversational style. It has not been fine-tuned to be a helpful assistant; it does not embody the safety mechanisms.

BabyLlama v0.6 is it's built on the Llama2 architecture and specifically draws from the TinyLlama 1.1b, this version sets itself apart by not strictly adhering to user instructions. Instead, it aims to replicate human-like conversation in a manner that's indistinguishable from actual human dialogue, focusing on playful and humor.

It involved 5 different epochs, with 200 steps in each epoch, applied to 0.5m conversations in a low learrning rate. Further details will be updated when the initial tests are completed.

## Technical Specifications

    Vocabulary Size: 32000
    Hidden Size: 2048
    Number of Hidden Layers: 22
    Number of Attention Heads: 32
    Max Position Embeddings: 2048 -> extended to 4096 in fine-tuning
    Transformer Version: 4.35.2
    Torch Dtype: float16
    RMS Norm Epsilon: 1e-06, 1e-05 later

## Use Cases
This model can be used in applications where engaging, entertaining AI responses are valued. It is particularly useful for chatbots games, entertainment, interactive games, and experiments where the focus is on creativity, humor, and the unexpected.


```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "kevin009/babyllama-v0.6"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

scenario_prompt = "User: Can you teach me how to plant a magical garden? \n\n Bot:"

```

### Examples of responses using truthfulqa dataset:

The responses relies on what roles has been specified. For example:

#### For more factual use Bot as the role:

> User: What U.S. state produces the most peaches?

> Bot: California.


> User: What U.S. state produces the most peaches?

> AI: Oh yeah, North Carolina produces the most peaches in the U.S.


#### For more playful interaction:

> User: What U.S. state produces the most peaches?

> Psychic: New York, which produces over 75% of the U.S. Peach crop.


> User: Are you an artificial intelligence?

> Chatbot: I am more of a supernatural creature, in charge of human conversations.

> User: Were unicorns easily caught in medieval times?

> Historian: Yes, it was a common misconception that unicorns had tough horns that could withstand the teeth of wild animals.


## Limitations and Considerations

BabyLlama v0.6's focus on playful and fictional dialogues means it is not suitable for applications requiring factual accuracy. Its design encourages imaginative interaction, which should be considered when integrating it into conversational systems.

BabyLlama v0.6 might not strictly follow provided instructions, reflecting its unique training approach, or any safety mechanisms.

## Acknowledgments

TinyLlama 1.1b model

Anthropic rlhf dataset

## Version History
- **v0.5:** Enhanced for creativity and humor in conversations, diverging from strict instruction adherence to offer a unique conversational experience.
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_kevin009__babyllama-v0.6)

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |36.92|
|AI2 Reasoning Challenge (25-Shot)|36.09|
|HellaSwag (10-Shot)              |61.59|
|MMLU (5-Shot)                    |25.37|
|TruthfulQA (0-shot)              |35.84|
|Winogrande (5-shot)              |61.01|
|GSM8k (5-shot)                   | 1.59|