metadata

license: llama2
language:
  - it
tags:
  - text-generation-inference

Model Card for LLaMAntino-2-70b-hf-UltraChat-ITA 🇮🇹 🌟

Last Update: 02/02/2024

Model description

LLaMAntino-2-70b-hf-UltraChat-ITA is a Large Language Model (LLM) that is an instruction-tuned version of LLaMAntino-2-70b (an italian-adapted LLaMA 2 - 70B). This model aims to provide Italian NLP researchers with an improved model for italian dialogue use cases.

The model was trained using QLora and using as training data UltraChat translated to the italian language using Argos Translate. If you are interested in more details regarding the training procedure, you can find the code we used at the following link:

Repository: https://github.com/swapUniba/LLaMAntino

NOTICE: the code has not been released yet, we apologize for the delay, it will be available asap!

Developed by: Pierpaolo Basile, Elio Musacchio, Marco Polignano, Lucia Siciliani, Giuseppe Fiameni, Giovanni Semeraro
Funded by: PNRR project FAIR - Future AI Research
Compute infrastructure: Leonardo supercomputer
Model type: LLaMA-2
Language(s) (NLP): Italian
License: Llama 2 Community License
Finetuned from model: swap-uniba/meta-llama/Llama-2-70b-hf

Prompt Format

This prompt format based on the LLaMA 2 prompt template adapted to the italian language was used:

" [INST] <<SYS>>\n" \
"Sei un assistente disponibile, rispettoso e onesto di nome Llamantino. " \
"Rispondi sempre nel modo più utile possibile, pur essendo sicuro. " \
"Le risposte non devono includere contenuti dannosi, non etici, razzisti, sessisti, tossici, pericolosi o illegali. " \
"Assicurati che le tue risposte siano socialmente imparziali e positive. " \
"Se una domanda non ha senso o non è coerente con i fatti, spiegane il motivo invece di rispondere in modo non corretto. " \
"Se non conosci la risposta a una domanda, non condividere informazioni false.\n" \
"<</SYS>>\n\n" \
f"{user_msg_1} [/INST] {model_answer_1} </s> <s> [INST] {user_msg_2} [/INST] {model_answer_2} </s> ... <s> [INST] {user_msg_N} [/INST] {model_answer_N} </s>"

We recommend using the same prompt in inference to obtain the best results!

How to Get Started with the Model

Below you can find an example of model usage:

from transformers import AutoTokenizer
import transformers
import torch
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3"

model = "swap-uniba/LLaMAntino-2-70b-hf-UltraChat-ITA"

tokenizer = AutoTokenizer.from_pretrained(model)
tokenizer.add_special_tokens({"pad_token":"<unk>"})
tokenizer.chat_template =   "{% set ns = namespace(i=0) %}" \
                            "{% for message in messages %}" \
                                "{% if message['role'] == 'user' and ns.i == 0 %}" \
                                       "{{ bos_token +' [INST] <<SYS>>\n' }}" \
                                       "{{ 'Sei un assistente disponibile, rispettoso e onesto di nome Llamantino. ' }}" \
                                       "{{ 'Rispondi sempre nel modo più utile possibile, pur essendo sicuro. ' }}" \
                                       "{{ 'Le risposte non devono includere contenuti dannosi, non etici, razzisti, sessisti, tossici, pericolosi o illegali. ' }}" \
                                       "{{ 'Assicurati che le tue risposte siano socialmente imparziali e positive. ' }}" \
                                       "{{ 'Se una domanda non ha senso o non è coerente con i fatti, spiegane il motivo invece di rispondere in modo non corretto. ' }}" \
                                       "{{ 'Se non conosci la risposta a una domanda, non condividere informazioni false.\n' }}" \
                                       "{{ '<</SYS>>\n\n' }}" \
                                       "{{ message['content'] + ' [/INST]' }}" \
                                "{% elif message['role'] == 'user' and ns.i != 0 %} " \
                                    "{{ bos_token + ' [INST] ' + message['content'] + ' [/INST]' }}" \
                                "{% elif message['role'] == 'assistant' %}" \
                                    "{{ ' '  + message['content'] + ' ' + eos_token + ' ' }}" \
                                "{% endif %}" \
                                "{% set ns.i = ns.i+1 %}" \
                            "{% endfor %}"



pipe = transformers.pipeline(model=model,
    device_map="balanced",
    tokenizer=tokenizer,
    return_full_text=False,  # langchain expects the full text
    task='text-generation',
    max_new_tokens=512,  # max number of tokens to generate in the output
    temperature=0.8 #temperature
)
messages = [{"role": "user", "content": "Cosa sono i word embeddings?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False)

sequences = pipe(text)
for seq in sequences:
    print(f"{seq['generated_text']}")

If you are facing issues when loading the model, you can try to load it Quantized:

model = AutoModelForCausalLM.from_pretrained(model_id, load_in_8bit=True)

Note:

The model loading strategy above requires the bitsandbytes and accelerate libraries
The Tokenizer, by default, adds at the beginning of the prompt the '<BOS>' token. If that is not the case, add as a starting token the <s> string.

Evaluation

Coming soon!

Citation

If you use this model in your research, please cite the following:

@misc{basile2023llamantino,
      title={LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language}, 
      author={Pierpaolo Basile and Elio Musacchio and Marco Polignano and Lucia Siciliani and Giuseppe Fiameni and Giovanni Semeraro},
      year={2023},
      eprint={2312.09993},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}