File size: 7,497 Bytes
fc43e2f ea5298b fc43e2f 1fd0c6c a3d0cb6 1fd0c6c a3d0cb6 1fd0c6c a3d0cb6 fc43e2f a3d0cb6 fc43e2f a3d0cb6 55d4f18 a3d0cb6 7745268 a3d0cb6 1fd0c6c 7745268 1fd0c6c 7745268 1fd0c6c 7745268 1fd0c6c 7745268 1fd0c6c 7745268 a3d0cb6 f00b7b7 7745268 1fd0c6c 55d4f18 a3d0cb6 7745268 a3d0cb6 7745268 a3d0cb6 55d4f18 a3d0cb6 7745268 55d4f18 1fd0c6c a3d0cb6 7745268 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 |
---
base_model: unsloth/gemma-2-9b-it
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- gemma2
- trl
model-index:
- name: N3N_gemma-2-9b-it_20241029_1532
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 67.52
name: strict accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=nhyha/N3N_gemma-2-9b-it_20241029_1532
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 40.99
name: normalized accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=nhyha/N3N_gemma-2-9b-it_20241029_1532
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 20.47
name: exact match
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=nhyha/N3N_gemma-2-9b-it_20241029_1532
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 12.08
name: acc_norm
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=nhyha/N3N_gemma-2-9b-it_20241029_1532
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 16.39
name: acc_norm
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=nhyha/N3N_gemma-2-9b-it_20241029_1532
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 34.69
name: accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=nhyha/N3N_gemma-2-9b-it_20241029_1532
name: Open LLM Leaderboard
---
# N3N_gemma-2-9b-it_20241029_1532
## Model Overview
- **Base Model**: unsloth/gemma-2-9b-it
- **License**: apache-2.0
- **Parameters**: 10.2B
- **Language**: English
- **Training Framework**: [Unsloth](https://github.com/unslothai/unsloth) + Huggingface TRL
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
> **Achievement**: #1 Ranking for 9B and 12B LLMs (November 8, 2024)
## Introduction
N3N_gemma-2-9b-it_20241029_1532 is a 10.2B parameter open-source model built upon Gemma2-9B-Instruct through additional training. What sets this model apart is its fine-tuning process using a high-quality dataset derived from 1.6 million arXiv papers.
### Key Features
- **High-quality Dataset**: The model has been fine-tuned using a comprehensive dataset compiled from 1.6 million arXiv papers, ensuring robust performance across various real-world applications.
- **Superior Reasoning**: The model demonstrates exceptional performance in mathematical reasoning and complex problem-solving tasks, outperforming comparable models in these areas.
This model represents our commitment to advancing language model capabilities through meticulous dataset preparation and continuous model enhancement.
## Quickstart
Here is a code snippet with `apply_chat_template`, showing how to load the tokenizer and model and generate content. This method simplifies structuring conversation prompts by adding generation-specific prompts automatically.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
"nhyha/N3N_gemma-2-9b-it_20241029_1532",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("nhyha/N3N_gemma-2-9b-it_20241029_1532")
# `apply_chat_template` formats conversation messages for better model input structure
prompt = "Give me a short introduction to large language models."
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
# Automatically adds the necessary generation prompt to the message
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```
## Training Details
### Hyperparameters
```python
{
"seed": 3407,
"warmup_steps": 50,
"total_train_batch_size": 512,
"total_eval_batch_size": 64,
"learning_rate": 5e-05,
"optimizer": "adamw_8bit",
"lr_scheduler_type": "cosine",
"num_epochs": 3,
"r": 32,
"lora_alpha": 32,
"rs_lora": True,
"weight_decay": 0.01
}
```
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
| Metric |Value|
|-------------------|----:|
|Avg. |32.02|
|IFEval (0-Shot) |67.52|
|BBH (3-Shot) |40.99|
|MATH Lvl 5 (4-Shot)|20.47|
|GPQA (0-shot) |12.08|
|MuSR (0-shot) |16.39|
|MMLU-PRO (5-shot) |34.69|
## Business & Collaboration
### Contact
Are you looking for customized LLMs tailored to your business needs? Jikji Labs offers advanced infrastructure including H100*8 GPU clusters for optimal model training and deployment. Our expertise spans:
- Large-scale data processing
- High-performance GPU computing
- Custom model development and training
We welcome collaborations and are always eager to hear your feedback or discuss potential partnerships. Visit our website to learn how our infrastructure and expertise can drive your AI initiatives forward.
### Collaborations
We are actively seeking support and investment to further our development of robust language models, with a focus on building high-quality and specialized datasets to cater to a wide range of applications. Our expertise in dataset generation enables us to create models that are precise and adaptable to specific business requirements. If you are excited by the opportunity to collaborate and navigate future challenges with us, please visit [our website](https://www.n3n.ai/) for more information.
## Acknowledgement
Special thanks to [google](https://huggingface.co/google) for providing the base model to the Open-Source community.
|