Model Card for Model ID
Model Details
- To fine-tune Llama 3.1 for improved support of the Arabic language, I will utilize a dataset consisting of Arabic conversations.
Fine-tuning large language models (LLMs) like Llama 3.1 on a dataset containing text in a new language, such as Arabic, enhances their ability to understand, generate, and effectively use that language. This process allows the model to learn the nuances, grammar, vocabulary, and cultural context specific to Arabic. Consequently, it becomes more proficient in producing coherent and contextually relevant text in Arabic, thus expanding its multilingual capabilities.
Model Description
Llama3.1_8k
context window 128k
Developed by: [Alber Bshara]
Language(s) (NLP): [Arabic (Ar), English (En)]
License: [NeptoneAI]
Finetuned from model: [Fine-tuned from LLaMA3.1_8k model]
Model Sources [optional]
- Core Model: [https://ai.meta.com/blog/meta-llama-3-1/]
Bias, Risks, and Limitations
[More Information Needed]
Recommendations
How to Get Started with the Model
- To use this model, please scroll to the bottom of this page to see instance usage examples.
Training Details
Training Data
https://huggingface.co/M-A-D#:~:text=The%20Mixed%20Arabic%20Datasets%20(MAD,language%20datasets%20across%20the%20Internet.
Training hyperparameters
The following hyperparameters were used during training: - learning_rate: 0.0002 - train_batch_size: 2 - eval_batch_size: 8 - seed: 3407 - gradient_accumulation_steps: 4 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 5 - mixed_precision_training: Native AMP
Speeds, Sizes, Times [optional]
[More Information Needed]
Evaluation
Testing Data, Factors & Metrics
Testing Data
''' question = "ููู ูู ููู ุฅูุฌุงุฏ ุงููุงุณู ุงูู ุดุชุฑู ุงูุฃูุจุฑ ููุนุฏุฏูู 81 ู 27ุ"
ans=
ูู ูููุง ุงุณุชุฎุฏุงู "ุฎูุงุฑุฒู ูุฉ ุฅูููุฏุณ" ูุฅูุฌุงุฏ ุงููุงุณู ุงูู ุดุชุฑู ุงูุฃูุจุฑ ููุนุฏุฏูู 81 ู 27. ูุฐู ุงูุฎูุงุฑุฒู ูุฉ ุชุนุชู ุฏ ุนูู ุงููุณู ุฉ ุงูู ุชูุฑุฑุฉุ ุญูุซ ููุณู ุงูุนุฏุฏ ุงูุฃูุจุฑ ุนูู ุงูุนุฏุฏ ุงูุฃุตุบุฑุ ุซู ููุณู ุงูุนุฏุฏ ุงูุฃุตุบุฑ ุนูู ุงูุจุงูู ู ู ุงููุณู ุฉ ุงูุณุงุจูุฉุ ููุฌุฑู ูุฐู ุงูุนู ููุฉ ุญุชู ูููู ุงูุจุงูู ุตูุฑูุง.
ุงูุฎุทูุฉ 1: ููุณู 81 ุนูู 27: 81 รท 27 = 3 ูุงูุจุงูู R = 0.
ุงููุชูุฌุฉ ุงูููุงุฆูุฉ: ุนูุฏู ุง ูููู ุงูุจุงูู ุตูุฑูุงุ ูุฅู ุงูุนุฏุฏ ุงูุฐู ุชู ุงููุณู ุฉ ุนููู ูู ุชูู ุงูุฎุทูุฉ ูู ุงููุงุณู ุงูู ุดุชุฑู ุงูุฃูุจุฑ (GCD). ูู ูุฐู ุงูุญุงูุฉุ ุนูุฏู ุง ุฃุตุจุญ ุงูุจุงูู ุตูุฑูุงุ ูุงู ุงูุนุฏุฏ ุงูู ูุณูู ุนููู ูู 27ุ ูุจุงูุชุงูู GCD ูู 81 ู27 ูู 27. '''
[More Information Needed]
Factors
[More Information Needed]
Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
Model Examination [optional]
[More Information Needed]
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications [optional]
Model Architecture and Objective
[More Information Needed]
Compute Infrastructure
[More Information Needed]
Hardware
- can run on the T4, L4 GPU or other powerfull GPUs.
Software
Framework versions
- PEFT 0.12.0
- Transformers 4.44.2
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
[More Information Needed]
How to Use it:
import sys, os
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from unsloth.chat_templates import get_chat_template
from typing import Tuple, Dict, Any, List
import torch
class LLM:
def __init__(self, load_in_4bit: bool = True,
load_cpu_mem_usage: bool = True,
hf_model_path: str = "AlberBshara/ar_llama3.1",
max_new_tokens: int= 4096):
"""
Args:
load_in_4bit (bool): Use 4-bit quantization. Defaults to True.
load_cpu_mem_usage (bool): Reduce CPU memory usage. Defaults to True.
hf_model_path (str): The path of your model on HuggingFace-Hub like "your-user-name/model-name".
"""
assert torch.cuda.is_available(), "CUDA is not available. An NVIDIA GPU is required."
hf_auth_token = HUGGING_FACE_API_TOKEN
# Specify the quantization config
self._bnb_config = BitsAndBytesConfig(load_in_4bit=load_in_4bit)
# Load model directly with quantization config
self.model = AutoModelForCausalLM.from_pretrained(
hf_model_path,
low_cpu_mem_usage=load_cpu_mem_usage,
quantization_config=self._bnb_config,
use_auth_token=hf_auth_token
)
# Load the tokenizer
self.tokenizer = AutoTokenizer.from_pretrained(
hf_model_path,
use_auth_token=hf_auth_token
)
self.__tokenizer = get_chat_template(
self.tokenizer,
chat_template="llama-3",
mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"},
)
self._hf_model_path = hf_model_path
self._EOS_TOKEN_ID = self.__tokenizer.eos_token_id
self.max_new_tokens = max_new_tokens
self._prompt = lambda context, question: f"""
Please provide a detailed answer to the question using only the information provided in the context. Do not include any information that is not explicitly mentioned in the context.
Context: [{context}]
- If the context is in Arabic, answer in Arabic; otherwise, answer in English.
Question: [{question}]
Your answer should be comprehensive, thoroughly explaining the topic while staying within the boundaries of the provided context.
"""
def invoke(self, context: str, question: str) -> Tuple:
if not question.strip():
raise ValueError("question cannot be empty or None")
if not context.strip():
raise ValueError("context cannot be empty or None")
inputs = self._prompt(context, question)
messages = [{"from": "human", "value": inputs}]
inputs = self.__tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True, # Must add for generation
return_tensors="pt",
).to("cuda")
# Increase the max_new_tokens to allow more detailed responses
output_ids = self.model.generate(inputs, max_new_tokens=self.max_new_tokens, pad_token_id=self.__tokenizer.pad_token_id)
output_ids = output_ids.tolist()[0] if output_ids.size(0) == 1 else output_ids.tolist()
output_text = self.__tokenizer.decode(output_ids, skip_special_tokens=True)
# Caching GPU Mem.
del inputs
del output_ids
torch.cuda.empty_cache()
return output_text, messages
def extract_answer(self, response: str) -> str:
start_with: str = ".assistant"
start_index = response.find(start_with)
# If the word is found, extract the substring from that point onward
if start_index != -1:
# Move start_index to the end of the word
start_index += len(start_with)
return response[start_index:]
else:
return response
def get_metadata(self) -> Dict[str, Any]:
return {
"class_name": self.__class__.__name__,
"init_params": {
"load_in_4bit": True,
"load_cpu_mem_usage": True,
"hf_model_path": "AlberBshara/ar_llama3.1",
"hf_auth_token": "--%$%--",
"max_new_tokens": self.max_new_tokens
},
"methods": ["invoke", "extract_answer"]
}
llm = LLM()
Model tree for AlberBshara/ar_llama3.1
Base model
meta-llama/Llama-3.1-8B