File size: 7,497 Bytes
fc43e2f
 
 
 
 
 
 
 
 
 
 
ea5298b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fc43e2f
1fd0c6c
 
 
a3d0cb6
1fd0c6c
a3d0cb6
 
 
 
 
 
1fd0c6c
a3d0cb6
fc43e2f
a3d0cb6
fc43e2f
a3d0cb6
 
55d4f18
a3d0cb6
 
 
7745268
a3d0cb6
1fd0c6c
 
 
 
 
 
7745268
1fd0c6c
 
 
 
 
 
 
 
 
 
 
 
7745268
 
1fd0c6c
 
 
 
7745268
 
1fd0c6c
 
 
 
 
7745268
1fd0c6c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7745268
 
a3d0cb6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f00b7b7
7745268
 
1fd0c6c
 
 
 
 
 
 
 
 
 
 
 
 
 
55d4f18
 
a3d0cb6
 
 
7745268
a3d0cb6
 
 
7745268
a3d0cb6
55d4f18
a3d0cb6
7745268
 
55d4f18
1fd0c6c
a3d0cb6
7745268
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
---
base_model: unsloth/gemma-2-9b-it
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- gemma2
- trl
model-index:
- name: N3N_gemma-2-9b-it_20241029_1532 
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: IFEval (0-Shot)
      type: HuggingFaceH4/ifeval
      args:
        num_few_shot: 0
    metrics:
    - type: inst_level_strict_acc and prompt_level_strict_acc
      value: 67.52
      name: strict accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=nhyha/N3N_gemma-2-9b-it_20241029_1532
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BBH (3-Shot)
      type: BBH
      args:
        num_few_shot: 3
    metrics:
    - type: acc_norm
      value: 40.99
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=nhyha/N3N_gemma-2-9b-it_20241029_1532
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MATH Lvl 5 (4-Shot)
      type: hendrycks/competition_math
      args:
        num_few_shot: 4
    metrics:
    - type: exact_match
      value: 20.47
      name: exact match
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=nhyha/N3N_gemma-2-9b-it_20241029_1532
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-shot)
      type: Idavidrein/gpqa
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 12.08
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=nhyha/N3N_gemma-2-9b-it_20241029_1532
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-shot)
      type: TAUR-Lab/MuSR
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 16.39
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=nhyha/N3N_gemma-2-9b-it_20241029_1532
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU-PRO (5-shot)
      type: TIGER-Lab/MMLU-Pro
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 34.69
      name: accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=nhyha/N3N_gemma-2-9b-it_20241029_1532
      name: Open LLM Leaderboard
---





# N3N_gemma-2-9b-it_20241029_1532

## Model Overview
- **Base Model**: unsloth/gemma-2-9b-it
- **License**: apache-2.0
- **Parameters**: 10.2B
- **Language**: English
- **Training Framework**: [Unsloth](https://github.com/unslothai/unsloth) + Huggingface TRL

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

> **Achievement**: #1 Ranking for 9B and 12B LLMs (November 8, 2024)

## Introduction
N3N_gemma-2-9b-it_20241029_1532 is a 10.2B parameter open-source model built upon Gemma2-9B-Instruct through additional training. What sets this model apart is its fine-tuning process using a high-quality dataset derived from 1.6 million arXiv papers.

### Key Features
- **High-quality Dataset**: The model has been fine-tuned using a comprehensive dataset compiled from 1.6 million arXiv papers, ensuring robust performance across various real-world applications.
- **Superior Reasoning**: The model demonstrates exceptional performance in mathematical reasoning and complex problem-solving tasks, outperforming comparable models in these areas.

This model represents our commitment to advancing language model capabilities through meticulous dataset preparation and continuous model enhancement.




## Quickstart

Here is a code snippet with `apply_chat_template`, showing how to load the tokenizer and model and generate content. This method simplifies structuring conversation prompts by adding generation-specific prompts automatically.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    "nhyha/N3N_gemma-2-9b-it_20241029_1532",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("nhyha/N3N_gemma-2-9b-it_20241029_1532")

# `apply_chat_template` formats conversation messages for better model input structure
prompt = "Give me a short introduction to large language models."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]

# Automatically adds the necessary generation prompt to the message
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```





## Training Details
### Hyperparameters
```python
{
    "seed": 3407,
    "warmup_steps": 50,
    "total_train_batch_size": 512,
    "total_eval_batch_size": 64,
    "learning_rate": 5e-05,
    "optimizer": "adamw_8bit",
    "lr_scheduler_type": "cosine",
    "num_epochs": 3,
    "r": 32,
    "lora_alpha": 32,
    "rs_lora": True,
    "weight_decay": 0.01
}
```

  

# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)


|      Metric       |Value|
|-------------------|----:|
|Avg.               |32.02|
|IFEval (0-Shot)    |67.52|
|BBH (3-Shot)       |40.99|
|MATH Lvl 5 (4-Shot)|20.47|
|GPQA (0-shot)      |12.08|
|MuSR (0-shot)      |16.39|
|MMLU-PRO (5-shot)  |34.69|



## Business & Collaboration
### Contact
Are you looking for customized LLMs tailored to your business needs? Jikji Labs offers advanced infrastructure including H100*8 GPU clusters for optimal model training and deployment. Our expertise spans:

- Large-scale data processing
- High-performance GPU computing
- Custom model development and training

We welcome collaborations and are always eager to hear your feedback or discuss potential partnerships. Visit our website to learn how our infrastructure and expertise can drive your AI initiatives forward.

### Collaborations
We are actively seeking support and investment to further our development of robust language models, with a focus on building high-quality and specialized datasets to cater to a wide range of applications. Our expertise in dataset generation enables us to create models that are precise and adaptable to specific business requirements. If you are excited by the opportunity to collaborate and navigate future challenges with us, please visit [our website](https://www.n3n.ai/) for more information.


## Acknowledgement
Special thanks to [google](https://huggingface.co/google) for providing the base model to the Open-Source community.