--- model-index: - name: lince-zero results: [] license: apache-2.0 language: - es thumbnail: https://huggingface.co/mrm8488/falcoder-7b/resolve/main/falcoder.png pipeline_tag: text-generation --- # Lince Zero **Lince** is model fine-tuned on a massive and original corpus of Spanish instructions. ## Model description 🧠 TBA ## Training and evaluation data 📚 We created an instruction dataset following the format or popular datasets in the field such as *Alpaca* and *Dolly* and augmented it to reach **80k** samples. ### Training hyperparameters ⚙ TBA ### Training results 🗒️ TBA ### Example of usage 👩‍💻 ```py import torch from transformers import AutoModelForCausalLM, AutoTokenizer, AutoTokenizer model_id = "clibrain/lince-zero" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id).to("cuda") def create_instruction(instruction: str, input_data: str = None, context: str = None) -> str: sections = { "Instrucción": instruction, "Entrada": input_data, "Contexto": context, } system_prompt = "A continuación hay una instrucción que describe una tarea, junto con una entrada que proporciona más contexto. Escriba una respuesta que complete adecuadamente la solicitud.\n\n" prompt = system_prompt for title, content in sections.items(): if content is not None: prompt += f"### {title}:\n{content}\n\n" prompt += "### Respuesta:\n" return prompt def generate( instruction, input=None, context=None, max_new_tokens=128, temperature=0.1, top_p=0.75, top_k=40, num_beams=4, **kwargs ): prompt = create_instruction(instruction, input, context) print(prompt) inputs = tokenizer(prompt, return_tensors="pt") input_ids = inputs["input_ids"].to("cuda") attention_mask = inputs["attention_mask"].to("cuda") generation_config = GenerationConfig( temperature=temperature, top_p=top_p, top_k=top_k, num_beams=num_beams, **kwargs, ) with torch.no_grad(): generation_output = model.generate( input_ids=input_ids, attention_mask=attention_mask, generation_config=generation_config, return_dict_in_generate=True, output_scores=True, max_new_tokens=max_new_tokens, early_stopping=True ) s = generation_output.sequences[0] output = tokenizer.decode(s) return output.split("### Respuesta:")[1].lstrip("\n") instruction = "Dame una lista de lugares a visitar en España." print(generate(instruction)) ```