library_name: peft
base_model: Qwen/Qwen-7B-Chat
datasets:
- nlpai-lab/kullm-v2
language:
- ko
- en
- zh
Model Card for Model ID
Korean Chatbot based on Alibaba's QWEN (keep in mind that basic colab runtime with T4 GPU will lead to OOM error. Fine-tuned version of Qwen-14b-Chat-Int4 will not have this issue)
Model Details
Model Description
This is a instruction-tuned model of π€Qwen/Qwen-7B-Chat. The instruction data for fine-tuning was π€nlpai-lab/kullm-v2. The model is intended to communicate via Korean, although it preserves its high proficiency in English & Chinese. Currently working on fine-tuning π€Qwen-14B-Chat via QLoRA, and this will have better results without OOM error in basic colab Runtime, so STAY TUNED!
μ리λ°λ°μ QWENμΌλ‘ fine-tuneν AIλΉμ Ko-QWENμ μκ°ν©λλ€! nlpai-lab/kullm-v2 λ°μ΄ν°λ‘ Instruction-tuneμ μ§ννμ΅λλ€. νκ΅μ΄λ‘ λννλ λ° μ΄μ μ λ§μ·μ§λ§, μμ΄μ μ€κ΅μ΄ μ±λ₯λ μ¬μ ν νλ₯ν©λλ€. νμ¬ Qwen-14B-Chat-Int4 λͺ¨λΈμ QLoRA νμ΅μ μ§νμ€μ΄λ©°, μ΄λ λ μ’μ μ±λ₯μ 보μ΄λ©° κΈ°λ³Έ colab runtimeμμλ gpu OOMμλ¬κ° μμ κ²μΌλ‘ μμλ©λλ€!
Model Sources
- Paper [optional]: read the QWEN team's technical report.
- Demo [optional]:
Model
- Fine-Tuned by: Jungwon Chang
- Model type: Chatbot(AI assistant)
- Language(s) (NLP): Korean, English, Chinese
- License: Please refer to the original QWEN license.txt
- Finetuned from model [optional]: Qwen/Qwen-7B-Chat
Uses
The Ko-QWEN AI assistant is designed to serve as a versatile tool for individuals and organizations looking to enhance their communication and productivity. The intended uses of the model include:
- Language Learning and Practice: Assisting users in learning and practicing Korean, English, and Chinese through natural language conversation.
- Cultural Exchange: Facilitating cross-cultural communication and understanding by providing insights into language nuances and cultural context.
- Informational Assistance: Offering prompt responses to inquiries across various domains, helping users obtain information efficiently.
- Task Automation: Aiding in routine tasks such as scheduling, reminders, and managing emails, thereby improving personal and workplace productivity.
- Accessibility Services: Supporting users with disabilities by providing an alternative means of communication and interaction with digital content.
- Research and Development: Serving as a platform for researchers in computational linguistics, natural language processing, and AI to conduct experiments and develop new technologies.
- Users of this model are encouraged to apply it in ways that foster positive engagement and contribute to the advancement of language technologies and AI.
Out-of-Scope Use
This model is intended for constructive and ethical applications, such as language learning, cultural exchange, and providing assistance with information and tasks within the bounds of lawful behavior. Uses that are explicitly out of scope and strongly discouraged include:
- Illegal Activities: Any form of support for illegal activities, including hacking, phishing, or fraud.
- Harmful Content Creation: Generating content that is abusive, defamatory, harassing, hateful, obscene, or otherwise intended to harm others.
- Misinformation: Disseminating false information or contributing to the spread of rumors.
- Impersonation: Attempting to impersonate others or create misleading representations of individuals or entities.
- Bias and Discrimination: Using the model to promote bias, discrimination, or unfair treatment of individuals or groups based on race, gender, ethnicity, religion, or any other personal characteristics.
- Commercial Use: Utilizing the model for commercial purposes without proper licensing or agreement.
Bias, Risks, and Limitations
This model has been trained on a diverse dataset intended to minimize bias; however, as with any AI system, there is a risk of unintended bias in language generation, which could reinforce stereotypes or marginalize certain groups. Furthermore, while the model is proficient in multiple languages, there may be nuances and idiomatic expressions that it does not fully capture, potentially affecting the quality of interaction in certain cultural contexts. The model's responses are also based on patterns in data and may not always provide accurate or reliable information, especially in fast-evolving or specialized knowledge domains.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. To address these issues, we recommend the following:
- Critical Engagement: Users should critically evaluate the information provided by the model and cross-reference with authoritative sources when necessary.
- Diverse Testing: Continuously testing the model with a diverse set of inputs can help identify and mitigate biases.
- Contextual Awareness: Users should provide clear context when interacting with the model to improve the relevance and accuracy of its responses.
- Regular Updates: Periodic retraining of the model with current data can help maintain its relevance and accuracy over time.
- Transparency: Clearly communicating the model's limitations to users can help set appropriate expectations regarding its performance.
- Ethical Use Monitoring: Implement mechanisms to monitor and prevent the model's use in unethical or harmful ways.
How to Get Started with the Model
Use the code below to get started with the model, or play around with the model in the colab demo
# How to load model and tokenizer
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" if torch.cuda.is_available() else torch.device("cpu")
model_path = "Jungwonchang/Ko-QWEN-7B-Chat-LoRA"
config = PeftConfig.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, trust_remote_code=True, resume_download=True)
model = PeftModel.from_pretrained(model, model_path).to(device)
tokenizer = AutoTokenizer.from_pretrained(
config.base_model_name_or_path, trust_remote_code=True, resume_download=True,
)
# Chat by simply using the Qwen model's .chat() method.
query = "Any question you want to ask"
response, _ = model.chat(query)
# You can also change your system prompt
response, _ = model.chat(query, system="μλλ μμ
μ μ€λͺ
νλ λͺ
λ Ήμ΄μ
λλ€. μμ²μ μ μ ν μλ£νλ μλ΅μ μμ±νμΈμ.")
# You can make your custom chat function if you want to play with various hyperparameters.
# alpaca style single-turn Chat
def qwen_chat_single_turn(model, tokenizer, device, query, system_message="You are a helpful assistant."):
# Define special tokens and newline token
im_start_id = tokenizer.im_start_id
im_end_id = tokenizer.im_end_id
nl_token = tokenizer.encode("\n")[0]
# Function to tokenize individual parts of the conversation
def tokenize_conversation(role, content):
return [im_start_id] + tokenizer.encode(role) + [nl_token] + tokenizer.encode(content) + [im_end_id]
# Start constructing the full conversation context
context_tokens = tokenize_conversation("system", system_message) + [nl_token, nl_token]
# Add the current user query
context_tokens += tokenize_conversation("user", query) + [nl_token, nl_token]
# Add a token indicating the start of the assistant's response
context_tokens += [im_start_id] + tokenizer.encode("assistant") + [nl_token]
# Convert context tokens to a tensor and generate a response
input_ids = torch.tensor(context_tokens).unsqueeze(0).to(device)
generated_ids = model.generate(
input_ids=input_ids,
max_new_tokens=1024,
early_stopping=True,
do_sample=True,
top_k=20,
top_p=0.92,
no_repeat_ngram_size=3,
eos_token_id=im_end_id,
repetition_penalty=1.2,
num_beams=3
)
# Decode the generated response
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
# Extract only the assistant's response
response = generated_text.split("assistant\n")[-1].strip()
return response
query="Question you want to ask"
response = qwen_chat_single_turn(model, tokenizer, device, query=query,
system_message="μλλ μμ
μ μ€λͺ
νλ λͺ
λ Ήμ΄μ
λλ€. μμ²μ μ μ ν μλ£νλ μλ΅μ μμ±νμΈμ."
)
Training Details
Training Data
π€nlpai-lab/kullm-v2: 150k amount of Korean instruction dataset
Training Procedure
The model was fine-tuned using LoRA (Low-Rank Adaptation), which allows for efficient training of large language models by updating only a small set of parameters. The fine-tuning process was conducted on a single node with 2 GPUs, utilizing distributed training to enhance the training efficiency and speed. The lora rank was set to 32, for I only had limited time to access the GPUs.
Evaluation
Testing Data, Factors & Metrics
Testing Data
[More Information Needed]
Factors
[More Information Needed]
Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
A instruction fine-tuned(or SFT) version on Alibaba's QWEN/QWEN-7b
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: 2 NVIDIA RTX A6000
- Hours used: approximately 72 hours
- Cloud Provider: PaperSpace
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications
Model Architecture and Objective
Compute Infrastructure
[More Information Needed]
Hardware
[More Information Needed]
Software
[More Information Needed]
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
Jungwon Chang
Model Card Contact
[email protected] [email protected]
Training procedure
Framework versions
- PEFT 0.6.1