Quantization made by Richard Erkhov.
Phi-3.5-mini-ITA - bnb 4bits
- Model creator: https://huggingface.co/anakin87/
- Original model: https://huggingface.co/anakin87/Phi-3.5-mini-ITA/
Original model description:
license: mit datasets: - mlabonne/FineTome-100k - efederici/capybara-claude-15k-ita language: - it - en library_name: transformers pipeline_tag: text-generation base_model: microsoft/Phi-3.5-mini-instruct tags: - trl - phi3 - spectrum
Phi-3.5-mini-ITA
Fine-tuned version of Microsoft/Phi-3.5-mini-instruct optimized for better performance in Italian.
๐น Small yet powerful model with 3.82 billion parameters ๐น Supports 128k context length
๐๏ธโโ๏ธ Do you want to understand how the model was trained? Check out the ๐ full walkthrough article and the accompanying ๐ป notebook
๐ Evaluation
Model | Parameters | Average | MMLU_IT | ARC_IT | HELLASWAG_IT |
---|---|---|---|---|---|
anakin87/Phi-3.5-mini-ITA | 3.82 B | 57.67 | 59.93 | 51.5 | 61.57 |
meta-llama/Meta-Llama-3.1-8B-Instruct | 8.03 B | 56.97 | 58.43 | 48.42 | 64.07 |
microsoft/Phi-3.5-mini-instruct | 3.82 B | 56.82 | 60.03 | 49.19 | 61.25 |
For a detailed comparison of model performance, check out the Leaderboard for Italian Language Models.
๐ฎ Model in action
Demo
๐ฌ๐ฎ๐น Chat with the model on Hugging Face Spaces
Text generation with Transformers
The model is small, so it runs smoothly on Colab. It is also fine to load the model using quantization.
With transformers==4.44.2
, trust_remote_code=True
is needed to incorporate a minor bug fix in Phi3ForCausalLM
.
Read this discussion for more details.
โก The model is compatible with Flash Attention 2, which accelerates inference. To enable it, uncomment the attn_implementation
parameter in the code snippet below.
# pip install transformers accelerate
import torch
from transformers import pipeline
model_id="anakin87/Phi-3.5-mini-ITA"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
# attn_implementation="flash_attention_2", # UNCOMMENT TO USE FLASH ATTENTION 2
)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
user_input = "Puoi spiegarmi brevemente la differenza tra imperfetto e passato prossimo in italiano e quando si usano?"
messages = [{"role": "user", "content": user_input}]
outputs = pipe(prompt, max_new_tokens=500, do_sample=True, temperature=0.001)
print(outputs[0]["generated_text"])
Example output:
Certamente! Imperfetto e passato prossimo sono due tempi verbali in italiano che si riferiscono a azioni passate, ma hanno sfumature diverse.
Imperfetto:
- L'imperfetto รจ usato per descrivere azioni o situazioni passate che erano continue o ripetute nel tempo.
- Indica un'azione senza una fine specifica o un'azione che si svolgeva abitualmente.
- ร spesso usato per descrivere situazioni, condizioni o stati passati.
- Esempio: "Quando ero bambino, giocavo spesso nel parco."
Passato Prossimo:
- Il passato prossimo รจ usato per descrivere azioni passate che sono state completate o che hanno avuto una durata specifica.
- Indica un'azione che รจ avvenuta in un momento specifico nel passato.
- ร spesso usato per descrivere eventi o azioni che hanno una durata definita o che si sono svolte in un momento specifico.
- Esempio: "Ieri ho finito il libro."
In sintesi, l'imperfetto si usa per azioni continue o abituali nel passato, mentre il passato prossimo si usa per azioni completate o avvenute in un momento specifico nel passato.
Build AI applications
You can use the model to create a variety of AI applications.
I recommend using the ๐๏ธ Haystack LLM framework for orchestration. (spoiler: I work on it and it is open-source ๐)
This model is compatible with HuggingFaceLocalGenerator
and HuggingFaceLocalChatGenerator
components.
You can also deploy the model with a TGI container and then use it with HuggingFaceAPIGenerator
and the related Chat Generator.
Some examples you can keep inspiration from:
๐ง Training details
This model was fine-tuned using HF TRL. It underwent 2 epochs of instruction fine-tuning on the FineTome-100k and Capybara-Claude-15k-ita datasets. ๐ Thanks to the authors for providing these datasets.
I adopted a relatively new technique for parameter-efficient learning: Spectrum. The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ๏ธ freeze the rest.
Training required about 14 hours on a single A6000 GPU.
For complete training details, check out the ๐ full walkthrough article and the accompanying ๐ป notebook.
- Downloads last month
- 4