|
--- |
|
language: |
|
- ja |
|
tags: |
|
- merge |
|
- mergekit |
|
- lazymergekit |
|
- SakanaAI/EvoLLM-JP-A-v1-7B |
|
- stabilityai/japanese-stablelm-base-gamma-7b |
|
base_model: |
|
- SakanaAI/EvoLLM-JP-A-v1-7B |
|
- stabilityai/japanese-stablelm-base-gamma-7b |
|
--- |
|
|
|
# Hinoki-Sak-Sta-slerp-7B |
|
|
|
Hinoki-Sak-Sta-slerp-7B is a merge of the following models using the [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing) of [Maxime Labonne](https://huggingface.co/mlabonne) powered by [MergeKit](https://github.com/arcee-ai/mergekit) of [Arcee AI](https://www.arcee.ai): |
|
* [SakanaAI/EvoLLM-JP-A-v1-7B](https://huggingface.co/SakanaAI/EvoLLM-JP-A-v1-7B) (Base model) |
|
* [stabilityai/japanese-stablelm-base-gamma-7b](https://huggingface.co/stabilityai/japanese-stablelm-base-gamma-7b) |
|
|
|
## 🧩 Configuration |
|
|
|
```yaml |
|
slices: |
|
- sources: |
|
- model: SakanaAI/EvoLLM-JP-A-v1-7B |
|
layer_range: [0, 32] |
|
- model: stabilityai/japanese-stablelm-base-gamma-7b |
|
layer_range: [0, 32] |
|
merge_method: slerp |
|
base_model: SakanaAI/EvoLLM-JP-A-v1-7B |
|
parameters: |
|
t: |
|
- filter: self_attn |
|
value: [0, 0.5, 0.3, 0.7, 1] |
|
- filter: mlp |
|
value: [1, 0.5, 0.7, 0.3, 0] |
|
- value: 0.5 |
|
dtype: bfloat16 |
|
``` |
|
|
|
## 💻 Usage |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_path = "AkimfromParis/Hinoki-Sak-Sta-slerp-7B" |
|
tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype="auto", device_map="auto") |
|
model.eval() |
|
|
|
requests = [ |
|
"大谷翔平選手について教えてください", |
|
] |
|
|
|
system_message = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {user_input} ASSISTANT:" |
|
|
|
for req in requests: |
|
input_req = system_message.format(user_input=req) |
|
input_ids = tokenizer.encode(input_req, return_tensors="pt").to(device=model.device) |
|
tokens = model.generate( |
|
input_ids, |
|
max_new_tokens=1024, |
|
do_sample=True, |
|
pad_token_id=tokenizer.eos_token_id, |
|
) |
|
out = tokenizer.decode(tokens[0][len(input_ids[0]):], skip_special_tokens=True) |
|
print("USER:\n" + req) |
|
print("ASSISTANT:\n" + out) |
|
print() |
|
``` |
|
|
|
# Citation |
|
``` |
|
@article{goddard2024arcee, |
|
title={Arcee's MergeKit: A Toolkit for Merging Large Language Models}, |
|
author={Goddard, Charles and Siriwardhana, Shamane and Ehghaghi, Malikeh and Meyers, Luke and Karpukhin, Vlad and Benedict, Brian and McQuade, Mark and Solawetz, Jacob}, |
|
journal={arXiv preprint arXiv:2403.13257}, |
|
year={2024} |
|
} |
|
``` |
|
|
|
arxiv.org/abs/2403.13257 |