Edit model card

This is the Mistral version of Qwen2-7B-Instruct model by Alibaba Cloud. The original codebase can be found at: (https://github.com/hiyouga/LLaMA-Factory/blob/main/tests/llamafy_qwen.py). I have made modifications to make it compatible with qwen2. This model is converted with https://github.com/Minami-su/character_AI_open/blob/main/mistral_qwen2.py

special

1.Before using this model, you need to modify modeling_mistral.py in transformers library

2.vim /root/anaconda3/envs/train/lib/python3.9/site-packages/transformers/models/mistral/modeling_mistral.py

3.find MistralAttention,

4.modify q,k,v,o bias=False ----->, bias=config.attention_bias

Before: image/png After: image/png

Differences between qwen2 mistral and qwen2 llamafy

Compared to qwen2 llamafy,qwen2 mistral can use sliding window attention,qwen2 mistral is faster than qwen2 llamafy, and the context length is better

Usage:


from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
tokenizer = AutoTokenizer.from_pretrained("Minami-su/Qwen2-7B-Instruct-mistral")
model = AutoModelForCausalLM.from_pretrained("Minami-su/Qwen2-7B-Instruct-mistral", torch_dtype="auto", device_map="auto")
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

messages = [
    {"role": "user", "content": "Who are you?"}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
inputs = inputs.to("cuda")
generate_ids = model.generate(inputs,max_length=2048, streamer=streamer)
Downloads last month
13
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.