Edit model card

youri-2x7b_dev

This model is a Mixture of Experts (MoE) merger of the following two models:

๐Ÿ† Evaluation

All scores for these benchmarks have been evaluated using the Stability-AI/lm-evaluation-harness. The results of the benchmark scores are stored in benchmark_scores. For detailed information on the scores and the conditions under which they were obtained, please refer to this link.

Model JCommonsenseQA(3-shot,acc.) JNLI(3-shot,balanced acc.) MARC-ja(0-shot,balanced acc.) JSQuAD(2-shot,F1) 4-AVERAGE
youri-2x7b_dev 91.15 71.03 95.90 91.30 87.34
youri-7b-instruction *1 88.83 63.56 93.78 92.19 84.59
youri-7b-chat *1 91.78 70.35 96.69 79.62 84.61
Model jaqket-v2(1-shot,F1) xlsum(1-shot,ROUGE 2) *2 6-AVERAGE
youri-2x7b_dev 84.59 25.62 76.59
youri-7b-instruction *1 83.92 24.67 75.13
youri-7b-chat *1 83.71 24.21 75.33
Model xwinograd(0-shot,acc.) *2 mgsm(5-shot,acc.) *2 JCoLA(2-shot,balanced acc.) *2 9-AVERAGE
youri-2x7b_dev 81.43 24.80 59.09 69.43
youri-7b-instruction *1 78.94 17.20 54.04 66.35
youri-7b-chat *1 80.92 25.20 53.78 67.36

*1 From the rinna's LM Benchmark.
*2 Since there was no mention of these template versions in rinna's LM Benchmark, the scores were calculated without specifying a template.

๐Ÿงฉ Configuration

The model has been made with a custom version of the mergekit library (mixtral branch) and the following configuration:

base_model: rinna/youri-7b-chat
gate_mode: hidden # one of "hidden", "cheap_embed", or "random"
dtype: bfloat16 # output dtype (float32, float16, or bfloat16)
experts:
  - source_model: rinna/youri-7b-chat
    positive_prompts: 
      - "่ณชๅ•ใจๅ›ž็ญ”ใฎ้ธๆŠž่‚ขใ‚’ๅ…ฅๅŠ›ใจใ—ใฆๅ—ใ‘ๅ–ใ‚Šใ€้ธๆŠž่‚ขใ‹ใ‚‰ๅ›ž็ญ”ใ‚’้ธๆŠžใ—ใฆใใ ใ•ใ„ใ€‚"
      - "ๅ‰ๆใจไปฎ่ชฌใฎ้–ขไฟ‚ใ‚’ๅซๆ„ใ€็Ÿ›็›พใ€ไธญ็ซ‹ใฎไธญใ‹ใ‚‰ๅ›ž็ญ”ใ—ใฆใใ ใ•ใ„ใ€‚"
      - "ไปฅไธ‹ใฎใƒ†ใ‚ญใ‚นใƒˆใ‚’ใ€ใƒใ‚ธใƒ†ใ‚ฃใƒ–ใพใŸใฏใƒใ‚ฌใƒ†ใ‚ฃใƒ–ใฎๆ„Ÿๆƒ…ใ‚ฏใƒฉใ‚นใฎใ„ใšใ‚Œใ‹ใซๅˆ†้กžใ—ใฆใใ ใ•ใ„ใ€‚"
      - "ไปฅไธ‹ใฏใ€ใ‚ฟใ‚นใ‚ฏใ‚’่ชฌๆ˜Žใ™ใ‚‹ๆŒ‡็คบใจใ€ๆ–‡่„ˆใฎใ‚ใ‚‹ๅ…ฅๅŠ›ใฎ็ต„ใฟๅˆใ‚ใ›ใงใ™ใ€‚่ฆๆฑ‚ใ‚’้ฉๅˆ‡ใซๆบ€ใŸใ™ๅฟœ็ญ”ใ‚’ๆ›ธใใชใ•ใ„ใ€‚"
  - source_model: rinna/youri-7b-instruction
    positive_prompts: 
     - "่ณชๅ•ใซๅฏพใ™ใ‚‹ๅ›ž็ญ”ใ‚’้กŒๅใจๆ–‡็ซ ใ‹ใ‚‰ไธ€่จ€ใงๆŠฝๅ‡บใ—ใฆใใ ใ•ใ„ใ€‚ๅ›ž็ญ”ใฏๅ่ฉžใง็ญ”ใˆใฆใใ ใ•ใ„ใ€‚"
     - "ไธŽใˆใ‚‰ใ‚ŒใŸใƒ‹ใƒฅใƒผใ‚น่จ˜ไบ‹ใ‚’่ฆ็ด„ใ—ใฆใใ ใ•ใ„ใ€‚"
     - "ไธŽใˆใ‚‰ใ‚ŒใŸๆ–‡ใŒๆ–‡ๆณ•็š„ใงใ‚ใ‚‹ใ‹ใ‚’ๅ›ž็ญ”ใ—ใฆใใ ใ•ใ„ใ€‚"

The positive_prompts in the above configuration are extracted from the instructions of benchmarks that each model excels in. For reference on the benchmarks for each model, please see the LM Benchmark at rinna's LM Benchmark. These benchmarks provide a detailed overview of the areas where each individual model performs particularly well, guiding the effective use of the merged model in various natural language processing tasks.

๐Ÿ’ป Usage

!pip install -q --upgrade transformers einops accelerate bitsandbytes

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "HachiML/youri-2x7b_dev"
torch.set_default_device("cuda")

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    torch_dtype="auto", 
    load_in_4bit=True, 
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    model_name, 
    trust_remote_code=True
)

torch.set_default_device("cuda")

# Create input
instruction = "ๆฌกใฎๆ—ฅๆœฌ่ชžใ‚’่‹ฑ่ชžใซ็ฟป่จณใ—ใฆใใ ใ•ใ„ใ€‚"
input = "ๅคง่ฆๆจก่จ€่ชžใƒขใƒ‡ใƒซ๏ผˆใ ใ„ใใผใ’ใ‚“ใ”ใƒขใƒ‡ใƒซใ€่‹ฑ: large language modelใ€LLM๏ผ‰ใฏใ€ๅคšๆ•ฐใฎใƒ‘ใƒฉใƒกใƒผใ‚ฟ๏ผˆๆ•ฐๅƒไธ‡ใ‹ใ‚‰ๆ•ฐๅๅ„„๏ผ‰ใ‚’ๆŒใคไบบๅทฅใƒ‹ใƒฅใƒผใƒฉใƒซใƒใƒƒใƒˆใƒฏใƒผใ‚ฏใงๆง‹ๆˆใ•ใ‚Œใ‚‹ใ‚ณใƒณใƒ”ใƒฅใƒผใ‚ฟ่จ€่ชžใƒขใƒ‡ใƒซใงใ€่†จๅคงใชใƒฉใƒ™ใƒซใชใ—ใƒ†ใ‚ญใ‚นใƒˆใ‚’ไฝฟ็”จใ—ใฆ่‡ชๅทฑๆ•™ๅธซใ‚ใ‚Šๅญฆ็ฟ’ใพใŸใฏๅŠๆ•™ๅธซใ‚ใ‚Šๅญฆ็ฟ’ใซใ‚ˆใฃใฆ่จ“็ทดใŒ่กŒใ‚ใ‚Œใ‚‹ใ€‚"
prompt = f"""
ไปฅไธ‹ใฏใ€ใ‚ฟใ‚นใ‚ฏใ‚’่ชฌๆ˜Žใ™ใ‚‹ๆŒ‡็คบใจใ€ๆ–‡่„ˆใฎใ‚ใ‚‹ๅ…ฅๅŠ›ใฎ็ต„ใฟๅˆใ‚ใ›ใงใ™ใ€‚่ฆๆฑ‚ใ‚’้ฉๅˆ‡ใซๆบ€ใŸใ™ๅฟœ็ญ”ใ‚’ๆ›ธใใชใ•ใ„ใ€‚

### ๆŒ‡็คบ:
{instruction}

### ๅ…ฅๅŠ›:
{input}

### ๅฟœ็ญ”:
"""

# Tokenize the input string
token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")

# Generate text using the model
with torch.no_grad():
    output_ids = model.generate(
        token_ids.to(model.device),
        max_new_tokens=200,
        do_sample=True,
        temperature=0.5,
        pad_token_id=tokenizer.pad_token_id,
        bos_token_id=tokenizer.bos_token_id,
        eos_token_id=tokenizer.eos_token_id
    )

# Decode and print the output
output = tokenizer.decode(output_ids.tolist()[0])
print(output)
Downloads last month
34
Safetensors
Model size
11.1B params
Tensor type
BF16
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.