metadata

library_name: peft
base_model: Qwen/Qwen-7B-Chat
datasets:
  - nlpai-lab/kullm-v2
language:
  - ko
  - en
  - zh

Model Card for Model ID

Korean Chatbot based on Alibaba's QWEN (keep in mind that basic colab runtime with T4 GPU will lead to OOM error. Fine-tuned version of Qwen-14b-Chat-Int4 will not have this issue)

Model Details

Model Description

This is a instruction-tuned model of 🤗Qwen/Qwen-7B-Chat. The instruction data for fine-tuning was 🤗nlpai-lab/kullm-v2. The model is intended to communicate via Korean, although it preserves its high proficiency in English & Chinese. Currently working on fine-tuning 🤗Qwen-14B-Chat via QLoRA, and this will have better results without OOM error in basic colab Runtime, so STAY TUNED!

알리바바의 QWEN으로 fine-tune한 AI비서 Ko-QWEN을 소개합니다! nlpai-lab/kullm-v2 데이터로 Instruction-tune을 진행했습니다. 한국어로 대화하는 데 초점을 맞췄지만, 영어와 중국어 성능도 여전히 훌륭합니다. 현재 Qwen-14B-Chat-Int4 모델을 QLoRA 학습을 진행중이며, 이는 더 좋은 성능을 보이며 기본 colab runtime에서도 gpu OOM에러가 없을 것으로 예상됩니다!

Model Sources

Paper [optional]: read the QWEN team's technical report.
Demo [optional]:

Model

Fine-Tuned by: Jungwon Chang
Model type: Chatbot(AI assistant)
Language(s) (NLP): Korean, English, Chinese
License: Please refer to the original QWEN license.txt
Finetuned from model [optional]: Qwen/Qwen-7B-Chat

Uses

The Ko-QWEN AI assistant is designed to serve as a versatile tool for individuals and organizations looking to enhance their communication and productivity. The intended uses of the model include:

Language Learning and Practice: Assisting users in learning and practicing Korean, English, and Chinese through natural language conversation.
Cultural Exchange: Facilitating cross-cultural communication and understanding by providing insights into language nuances and cultural context.
Informational Assistance: Offering prompt responses to inquiries across various domains, helping users obtain information efficiently.
Task Automation: Aiding in routine tasks such as scheduling, reminders, and managing emails, thereby improving personal and workplace productivity.
Accessibility Services: Supporting users with disabilities by providing an alternative means of communication and interaction with digital content.
Research and Development: Serving as a platform for researchers in computational linguistics, natural language processing, and AI to conduct experiments and develop new technologies.
Users of this model are encouraged to apply it in ways that foster positive engagement and contribute to the advancement of language technologies and AI.

Out-of-Scope Use

This model is intended for constructive and ethical applications, such as language learning, cultural exchange, and providing assistance with information and tasks within the bounds of lawful behavior. Uses that are explicitly out of scope and strongly discouraged include:

Illegal Activities: Any form of support for illegal activities, including hacking, phishing, or fraud.
Harmful Content Creation: Generating content that is abusive, defamatory, harassing, hateful, obscene, or otherwise intended to harm others.
Misinformation: Disseminating false information or contributing to the spread of rumors.
Impersonation: Attempting to impersonate others or create misleading representations of individuals or entities.
Bias and Discrimination: Using the model to promote bias, discrimination, or unfair treatment of individuals or groups based on race, gender, ethnicity, religion, or any other personal characteristics.
Commercial Use: Utilizing the model for commercial purposes without proper licensing or agreement.

Bias, Risks, and Limitations

This model has been trained on a diverse dataset intended to minimize bias; however, as with any AI system, there is a risk of unintended bias in language generation, which could reinforce stereotypes or marginalize certain groups. Furthermore, while the model is proficient in multiple languages, there may be nuances and idiomatic expressions that it does not fully capture, potentially affecting the quality of interaction in certain cultural contexts. The model's responses are also based on patterns in data and may not always provide accurate or reliable information, especially in fast-evolving or specialized knowledge domains.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. To address these issues, we recommend the following:

Critical Engagement: Users should critically evaluate the information provided by the model and cross-reference with authoritative sources when necessary.
Diverse Testing: Continuously testing the model with a diverse set of inputs can help identify and mitigate biases.
Contextual Awareness: Users should provide clear context when interacting with the model to improve the relevance and accuracy of its responses.
Regular Updates: Periodic retraining of the model with current data can help maintain its relevance and accuracy over time.
Transparency: Clearly communicating the model's limitations to users can help set appropriate expectations regarding its performance.
Ethical Use Monitoring: Implement mechanisms to monitor and prevent the model's use in unethical or harmful ways.

How to Get Started with the Model

Use the code below to get started with the model, or play around with the model in the colab demo

# How to load model and tokenizer
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" if torch.cuda.is_available() else torch.device("cpu")
model_path = "Jungwonchang/Ko-QWEN-7B-Chat-LoRA"

config = PeftConfig.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, trust_remote_code=True, resume_download=True)
model = PeftModel.from_pretrained(model, model_path).to(device)

tokenizer = AutoTokenizer.from_pretrained(
        config.base_model_name_or_path, trust_remote_code=True, resume_download=True,
    )

# Chat by simply using the Qwen model's .chat() method.
query = "Any question you want to ask"
response, _ = model.chat(query)

# You can also change your system prompt
response, _ = model.chat(query, system="아래는 작업을 설명하는 명령어입니다. 요청을 적절히 완료하는 응답을 작성하세요.")

# You can make your custom chat function if you want to play with various hyperparameters.
# alpaca style single-turn Chat
def qwen_chat_single_turn(model, tokenizer, device, query, system_message="You are a helpful assistant."):
    # Define special tokens and newline token
    im_start_id = tokenizer.im_start_id
    im_end_id = tokenizer.im_end_id
    nl_token = tokenizer.encode("\n")[0]

    # Function to tokenize individual parts of the conversation
    def tokenize_conversation(role, content):
        return [im_start_id] + tokenizer.encode(role) + [nl_token] + tokenizer.encode(content) + [im_end_id]

    # Start constructing the full conversation context
    context_tokens = tokenize_conversation("system", system_message) + [nl_token, nl_token]

    # Add the current user query
    context_tokens += tokenize_conversation("user", query) + [nl_token, nl_token]

    # Add a token indicating the start of the assistant's response
    context_tokens += [im_start_id] + tokenizer.encode("assistant") + [nl_token]

    # Convert context tokens to a tensor and generate a response
    input_ids = torch.tensor(context_tokens).unsqueeze(0).to(device)
    generated_ids = model.generate(
        input_ids=input_ids,
        max_new_tokens=1024,
        early_stopping=True,
        do_sample=True,
        top_k=20,
        top_p=0.92,
        no_repeat_ngram_size=3,
        eos_token_id=im_end_id,
        repetition_penalty=1.2,
        num_beams=3
    )

    # Decode the generated response
    generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

    # Extract only the assistant's response
    response = generated_text.split("assistant\n")[-1].strip()

    return response

query="Question you want to ask"
response = qwen_chat_single_turn(model, tokenizer, device, query=query,
                                 system_message="아래는 작업을 설명하는 명령어입니다. 요청을 적절히 완료하는 응답을 작성하세요."
                                )

Training Details

Training Data

🤗nlpai-lab/kullm-v2: 150k amount of Korean instruction dataset

Training Procedure

The model was fine-tuned using LoRA (Low-Rank Adaptation), which allows for efficient training of large language models by updating only a small set of parameters. The fine-tuning process was conducted on a single node with 2 GPUs, utilizing distributed training to enhance the training efficiency and speed. The lora rank was set to 32, for I only had limited time to access the GPUs.

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

A instruction fine-tuned(or SFT) version on Alibaba's QWEN/QWEN-7b

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: 2 NVIDIA RTX A6000
Hours used: approximately 72 hours
Cloud Provider: PaperSpace
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Technical Specifications

Model Architecture and Objective

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

Jungwon Chang

Model Card Contact

[email protected] [email protected]

Training procedure

Framework versions

PEFT 0.6.1