|
--- |
|
library_name: peft |
|
base_model: EleutherAI/gpt-neo-2.7B |
|
--- |
|
|
|
|
|
Model Details |
|
|
|
Original Model: EleutherAI/gpt-neo-2.7B |
|
Fine-Tuned For: Azerbaijani language understanding and generation |
|
Dataset Used: Azerbaijani translation of the Stanford Alpaca dataset |
|
Fine-Tuning Method: Self-instruct method |
|
|
|
|
|
This model, is part of the ["project/Barbarossa"](https://github.com/Alas-Development-Center/project-barbarossa) initiative, aimed at enhancing natural language processing capabilities for the Azerbaijani language. By fine-tuning this model on the Azerbaijani translation of the Stanford Alpaca dataset using the self-instruct method, we've made significant strides in improving AI's understanding and generation of Azerbaijani text. |
|
|
|
__Our primary objective with this model is to offer insights into the feasibility and outcomes of fine-tuning large language models (LLMs) for the Azerbaijani language. The fine-tuning process was undertaken with limited resources, providing valuable learnings rather than creating a model ready for production use. Therefore, we recommend treating this model as a reference or a guide to understanding the potential and challenges involved in fine-tuning LLMs for specific languages. It serves as a foundational step towards further research and development rather than a direct solution for production environments.__ |
|
|
|
|
|
This project is a proud product of the [Alas Development Center (ADC)](https://az.linkedin.com/company/alas-development-center?trk=ppro_cprof). We are thrilled to offer these finely-tuned large language models to the public, free of charge. |
|
|
|
How to use? |
|
|
|
``` |
|
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer, pipeline |
|
|
|
model_path = "alasdevcenter/az-gptneo" |
|
|
|
model = AutoModelForCausalLM.from_pretrained(model_path) |
|
tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
|
|
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200) |
|
|
|
instruction = "Təbiətin qorunması " |
|
formatted_prompt = f"""Aşağıda daha çox kontekst təmin edən təlimat var. Sorğunu adekvat şəkildə tamamlayan cavab yazın. |
|
### Təlimat: |
|
{instruction} |
|
### Cavab: |
|
""" |
|
|
|
result = pipe(formatted_prompt) |
|
print(result[0]['generated_text']) |
|
``` |
|
|