Iker
/

Llama-3-Neurona-8b

@@ -1,201 +1,447 @@
 ---
 library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
 library_name: transformers
+tags:
+- synthetic
+license: llama3
+datasets:
+- pinzhenchen/alpaca-cleaned-es
+- Danielbrdz/Barcenas-Economia
+- HiTZ/casimedicos-exp
+- somosnlp/coser_resumenes
+- csebuetnlp/CrossSum
+- Iker/Document-Translation-en-es
+- somosnlp/es-inclusive-language-it
+- FreedomIntelligence/evol-instruct-spanish
+- glaiveai/glaive-code-assistant-v3
+- glaiveai/glaive-function-calling-v2
+- Iker/InstructTranslation-EN-ES
+- somosnlp/lenguaje-claro-dataset
+- somosnlp/LingComp_QA
+- bltlab/lr-sum
+- Iker/NoticIA
+- xaviviro/oasst2_es_gpt
+- teknium/OpenHermes-2.5
+- Iker/OpenHermes-2.5-Spanish
+- Helsinki-NLP/opus-100
+- projecte-aina/RAG_Multilingual
+- sem_eval_2018_task_1
+- davidstap/ted_talks
+- HiTZ/This-is-not-a-dataset
+- wikipedia
+language:
+- es
+- en
+pipeline_tag: text-generation
+base_model: meta-llama/Meta-Llama-3-8B
 ---
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/614a1ebb8f82f1df64d55126/2i_CasoeJTgQPNoBIfA8E.jpeg)
+# Neurona 8B Beta: Un Modelo de Lenguage en Español
+> Esta es una versión preliminar del dataset card. El modelo está en desarrollo y no es la versión final. Si quieres saber más sobre este modelo, escribe a [email protected]
+Neurona 8B es un modelo de lenguaje en Español. Esta es la primera iteración y un experimento para poner a punto los scripts y la infraestructura.
+Neurona 8B ha sido entrenado con los siguiente datasets. No en todos los casos se ha usado el dataset completo
+- [pinzhenchen/alpaca-cleaned-es](https://huggingface.co/datasets/pinzhenchen/alpaca-cleaned-es)
+- [Danielbrdz/Barcenas-Economia](https://huggingface.co/datasets/Danielbrdz/Barcenas-Economia)
+- [HiTZ/casimedicos-exp](https://huggingface.co/datasets/HiTZ/casimedicos-exp)
+- [somosnlp/coser_resumenes](https://huggingface.co/datasets/somosnlp/coser_resumenes)
+- [csebuetnlp/CrossSum en + es](https://huggingface.co/datasets/csebuetnlp/CrossSum)
+- [Iker/Document-Translation-en-es](https://huggingface.co/datasets/Iker/Document-Translation-en-es)
+- [somosnlp/es-inclusive-language-it](https://huggingface.co/datasets/somosnlp/es-inclusive-language-it)
+- [FreedomIntelligence/evol-instruct-spanish](https://huggingface.co/datasets/FreedomIntelligence/evol-instruct-spanish)
+- [glaiveai/glaive-code-assistant-v3](https://huggingface.co/datasets/glaiveai/glaive-code-assistant-v3)
+- [glaiveai/glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2)
+- [Iker/InstructTranslation-EN-ES](https://huggingface.co/datasets/Iker/InstructTranslation-EN-ES)
+- [somosnlp/lenguaje-claro-dataset](https://huggingface.co/datasets/somosnlp/lenguaje-claro-dataset)
+- [somosnlp/LingComp_QA](https://huggingface.co/datasets/somosnlp/LingComp_QA)
+- [bltlab/lr-sum](https://huggingface.co/datasets/bltlab/lr-sum)
+- [Iker/NoticIA](https://huggingface.co/datasets/Iker/NoticIA)
+- [xaviviro/oasst2_es_gpt](https://huggingface.co/datasets/xaviviro/oasst2_es_gpt)
+- [teknium/OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5)
+- [Iker/OpenHermes-2.5-Spanish](https://huggingface.co/datasets/Iker/OpenHermes-2.5-Spanish)
+- [Helsinki-NLP/opus-100 en es](https://huggingface.co/datasets/Helsinki-NLP/opus-100)
+- [projecte-aina/RAG_Multilingual](https://huggingface.co/datasets/projecte-aina/RAG_Multilingual)
+- [sem_eval_2018_task_1](https://huggingface.co/datasets/sem_eval_2018_task_1)
+- [davidstap/ted_talks](https://huggingface.co/datasets/davidstap/ted_talks)
+- [HiTZ/This-is-not-a-dataset](https://huggingface.co/datasets/HiTZ/This-is-not-a-dataset)
+- [wikipedia es](https://huggingface.co/datasets/wikipedia)
+Esta mezcla de datasets en Inglés y Español, permite al modelo adquirir diferentes capacidades, como RAG, function calling, code assistant, question answering, summarization... tanto en Inglés como en Español.
+# Entrenamiento
+Este modelo se ha entrado usando 4xNvidia A100 80Gb y axolotl
+[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
+Esta es la configuración usada
+```yaml
+base_model: /ikerlariak/igarcia945/Mortadelo-Filemon/Meta-Llama-3-8B-Spanish/base_model
+model_type: AutoModelForCausalLM
+tokenizer_type: AutoTokenizer
+is_falcon_derived_model:
+is_llama_derived_model:
+is_qwen_derived_model:
+is_mistral_derived_model:
+load_in_8bit: false
+load_in_4bit: false
+strict: false
+device_map: null
+datasets:
+  - path: /ikerlariak/igarcia945/InstructDatasets/alpaca-cleaned-es.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/Barcenas-Economia.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/casimedicos.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/coser_resumene.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/CrossSum_en.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/CrossSum_es.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/Document-Translation-en-es.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/es-inclusive-language.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/evol-instruct-spanish.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/glaive-code-assistant-v3-small.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/glaive-function-calling-v2.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+        - tool
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/InstructTranslation-EN-ES.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/lenguaje-claro-dataset.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/LingComp_QA.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/lr-sum-es.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/NoticIA.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/NoticIA-large.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/NoticIA-summary.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/oasst2_es_gpt.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/OpenHermes-2.5-English.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/OpenHermes-2.5-Spanish.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/opus-100-en-es.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/RAG_Multilingual-es.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/sem_eval_2018_task_1.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/ted_talks-es_en.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/This-is-not-a-dataset.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+  - path: /ikerlariak/igarcia945/InstructDatasets/wikipedia-es.jsonl
+    type: sharegpt
+    conversation: llama3
+    field: conversations
+    roles:
+      input:
+        - system
+        - gpt
+      output:
+        - human
+chat_template: llama3
+dataset_prepared_path: /ikerlariak/igarcia945/Mortadelo-Filemon/Meta-Llama-3-8B-Spanish/dataset
+shuffle_merged_datasets: true
+val_set_size: 0.005
+output_dir: /ikerlariak/igarcia945/Mortadelo-Filemon/Meta-Llama-3-8B-Spanish
+adapter:
+lora_model_dir:
+sequence_len: 8192
+sample_packing: true
+eval_sample_packing: false
+pad_to_sequence_len: false
+tokens:
+  - "<tool_call>"
+  - "<tool_response>"
+  - "<tools>"
+  - "</tool_call>"
+  - "</tool_response>"
+  - "</tools>"
+  - "<reserved1>"
+  - "<reserved2>"
+neftune_noise_alpha: 5
+wandb_project: Mortadelo&Filemon
+wandb_entity: igarciaf
+wandb_watch:
+wandb_name: meta-llama-3-8B-spanish
+wandb_log_model:
+gradient_accumulation_steps: 32
+micro_batch_size: 2
+eval_batch_size: 2
+num_epochs: 2
+optimizer: adamw_torch_fused
+lr_scheduler: cosine
+learning_rate: 0.00007
+train_on_inputs: false
+group_by_length: false
+bf16: true
+fp16: false
+tf32: false
+gradient_checkpointing: true
+early_stopping_patience:
+resume_from_checkpoint:
+local_rank:
+logging_steps: 1
+xformers_attention:
+flash_attention: true
+warmup_ratio: 0.03
+evals_per_epoch: 4
+eval_table_size:
+save_strategy: "no"
+debug:
+deepspeed: /ikerlariak/igarcia945/Mortadelo-Filemon/train_configs/deepspeed_zero3.json
+weight_decay: 0.0
+fsdp:
+fsdp_config:
+seed: 33
+```