|
|
|
--- |
|
|
|
license: mit |
|
datasets: |
|
- mlabonne/FineTome-100k |
|
- efederici/capybara-claude-15k-ita |
|
language: |
|
- it |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
base_model: microsoft/Phi-3.5-mini-instruct |
|
tags: |
|
- trl |
|
- phi3 |
|
- spectrum |
|
|
|
--- |
|
|
|
![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ) |
|
|
|
# QuantFactory/Phi-3.5-mini-ITA-GGUF |
|
This is quantized version of [anakin87/Phi-3.5-mini-ITA](https://huggingface.co/anakin87/Phi-3.5-mini-ITA) created using llama.cpp |
|
|
|
# Original Model Card |
|
|
|
|
|
<img src="./assets/phi_35_mini_ita.png" width="450"></img> |
|
# Phi-3.5-mini-ITA |
|
|
|
Fine-tuned version of [Microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) optimized for better performance in Italian. |
|
|
|
- Small yet powerful model with 3.82 billion parameters |
|
- Supports 128k context length |
|
|
|
[๐ฌ๐ฎ๐น Chat with the model on Hugging Face Spaces](https://huggingface.co/spaces/anakin87/Phi-3.5-mini-ITA) |
|
|
|
## ๐ Evaluation |
|
|
|
| Model | Parameters | Average | MMLU_IT | ARC_IT | HELLASWAG_IT | |
|
| ------------------------------------- | ---------- | ------- | ------- | ------ | ------------ | |
|
| **anakin87/Phi-3.5-mini-ITA** | **3.82 B** |**57.67** | 59.93 | 51.5 | 61.57 | |
|
| meta-llama/Meta-Llama-3.1-8B-Instruct | 8.03 B | 56.97 | 58.43 | 48.42 | 64.07 | |
|
| microsoft/Phi-3.5-mini-instruct | 3.82 B | 56.82 | 60.03 | 49.19 | 61.25 | |
|
|
|
For a detailed comparison of model performance, check out the [Leaderboard for Italian Language Models](https://huggingface.co/spaces/FinancialSupport/open_ita_llm_leaderboard). |
|
|
|
## ๐ฎ Model in action |
|
### Demo |
|
[๐ฌ๐ฎ๐น Chat with the model on Hugging Face Spaces](https://huggingface.co/spaces/anakin87/Phi-3.5-mini-ITA) |
|
|
|
### Text generation with Transformers |
|
The model is small, so it runs smoothly on Colab. It is also fine to load the model using quantization. |
|
|
|
With `transformers==4.44.2`, `trust_remote_code=True` is needed to incorporate a minor bug fix in `Phi3ForCausalLM`. |
|
Read [this discussion](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/discussions/9) for more details. |
|
|
|
โก *The model is compatible with Flash Attention 2, which accelerates inference. To enable it, uncomment the `attn_implementation` parameter in the code snippet below.* |
|
|
|
```python |
|
# pip install transformers accelerate |
|
import torch |
|
from transformers import pipeline |
|
|
|
model_id="anakin87/Phi-3.5-mini-ITA" |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_id, |
|
device_map="auto", |
|
torch_dtype=torch.bfloat16, |
|
trust_remote_code=True, |
|
# attn_implementation="flash_attention_2", # UNCOMMENT TO USE FLASH ATTENTION 2 |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) |
|
|
|
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) |
|
|
|
user_input = "Puoi spiegarmi brevemente la differenza tra imperfetto e passato prossimo in italiano e quando si usano?" |
|
messages = [{"role": "user", "content": user_input}] |
|
outputs = pipe(prompt, max_new_tokens=500, do_sample=True, temperature=0.001) |
|
print(outputs[0]["generated_text"]) |
|
``` |
|
|
|
Example output: |
|
``` |
|
Certamente! Imperfetto e passato prossimo sono due tempi verbali in italiano che si riferiscono a azioni passate, ma hanno sfumature diverse. |
|
|
|
Imperfetto: |
|
- L'imperfetto รจ usato per descrivere azioni o situazioni passate che erano continue o ripetute nel tempo. |
|
- Indica un'azione senza una fine specifica o un'azione che si svolgeva abitualmente. |
|
- ร spesso usato per descrivere situazioni, condizioni o stati passati. |
|
- Esempio: "Quando ero bambino, giocavo spesso nel parco." |
|
|
|
Passato Prossimo: |
|
- Il passato prossimo รจ usato per descrivere azioni passate che sono state completate o che hanno avuto una durata specifica. |
|
- Indica un'azione che รจ avvenuta in un momento specifico nel passato. |
|
- ร spesso usato per descrivere eventi o azioni che hanno una durata definita o che si sono svolte in un momento specifico. |
|
- Esempio: "Ieri ho finito il libro." |
|
|
|
In sintesi, l'imperfetto si usa per azioni continue o abituali nel passato, mentre il passato prossimo si usa per azioni completate o avvenute in un momento specifico nel passato. |
|
``` |
|
|
|
### Build AI applications |
|
You can use the model to create a variety of AI applications. |
|
|
|
I recommend using the [๐๏ธ Haystack LLM framework](https://haystack.deepset.ai/) for orchestration. |
|
(spoiler: I work on it and it is open-source ๐) |
|
|
|
This model is compatible with [`HuggingFaceLocalGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalgenerator) and [`HuggingFaceLocalChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalchatgenerator) components. |
|
You can also deploy the model with a TGI container and then use it with [`HuggingFaceAPIGenerator`](https://docs.haystack.deepset.ai/docs/huggingfaceapigenerator) and the related Chat Generator. |
|
|
|
Some examples you can keep inspiration from: |
|
- [RAG with local open models](https://haystack.deepset.ai/blog/guide-to-using-zephyr-with-haystack2) |
|
- [Summarization from a Website](https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/hackernews-custom-component-rag.ipynb) |
|
- [Multilingual RAG](https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/multilingual_rag_podcast.ipynb) |
|
|
|
|
|
## ๐ง Training details |
|
This model was fine-tuned using HF TRL. |
|
It underwent 2 epochs of instruction fine-tuning on the [FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) and [Capybara-Claude-15k-ita](https://huggingface.co/datasets/efederici/capybara-claude-15k-ita) datasets. ๐ Thanks to the authors for providing these datasets. |
|
|
|
I adopted a relatively new technique for parameter-efficient learning: [Spectrum](https://arxiv.org/abs/2406.06623). |
|
The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ๏ธ freeze the rest. |
|
|
|
Training required about 14 hours on a single A40 GPU. |
|
|
|
I may release a guide/tutorial soon. Stay tuned! ๐ป |
|
|