javier-ab-bsc
commited on
Commit
•
4ce4dd5
1
Parent(s):
72dc82e
Update README.md
Browse files
README.md
CHANGED
@@ -541,28 +541,6 @@ The dataset does not allow for external contributions.
|
|
541 |
|
542 |
</details>
|
543 |
|
544 |
-
### Finetuning Data
|
545 |
-
|
546 |
-
This instruction-tuned variant has been trained with a mixture of 276k English, Spanish, and Catalan multi-turn instructions gathered from open datasets:
|
547 |
-
| Dataset | ca | en | es |
|
548 |
-
|-----------------------|:------:|:------:|:------:|
|
549 |
-
| alpaca-cleaned | - | 50,000 | - |
|
550 |
-
| aya-dataset | - | 3,944 | 3,854 |
|
551 |
-
| CoQCat | 4,797 | - | - |
|
552 |
-
| databricks-dolly-15k | - | 15,011 | - |
|
553 |
-
| dolly-3k-ca | 3,232 | - | - |
|
554 |
-
| flores-instr | 1,994 | 1,994 | 3,988 |
|
555 |
-
| MentorCA | 7,122 | - | - |
|
556 |
-
| MentorES | - | - | 7,122 |
|
557 |
-
| no-robots | - | 9,499 | - |
|
558 |
-
| oasst-ca | 2,518 | - | - |
|
559 |
-
| oasst2 | 750 | 31,086 | 15,438 |
|
560 |
-
| open-orca | - | 50,000 | - |
|
561 |
-
| RagMultilingual | 16,043 | 14,997 | 11,263 |
|
562 |
-
| tower-blocks | - | 19,895 | 2,000 |
|
563 |
-
| **Total** | **36,456** | **196,426** | **43,665** |
|
564 |
-
|
565 |
-
---
|
566 |
|
567 |
## Evaluation
|
568 |
|
@@ -595,14 +573,16 @@ within the framework of [ILENIA Project](https://proyectoilenia.es/) with refere
|
|
595 |
|
596 |
### Acknowledgements
|
597 |
|
598 |
-
This project benefited from the contributions of many teams and institutions, including:
|
599 |
-
Senado de España, Parlament de Catalunya, Òmnium Cultural, Dialnet, Institut d’Estudis Aranesos,
|
600 |
-
Fundación Elcano, Universidad de Las Palmas de Gran Canaria, Occiglot, Common Crawl, the Welsh Government,
|
601 |
-
the German Research Center for Artificial Intelligence (DFKI) and the partners of Proyecto ILENIA.
|
602 |
-
Their valuable efforts have been instrumental in the development of this work.
|
603 |
|
604 |
-
|
605 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
606 |
|
607 |
### Disclaimer
|
608 |
Be aware that the model may contain biases or other unintended distortions.
|
@@ -613,15 +593,8 @@ including those governing the use of Artificial Intelligence.
|
|
613 |
The Barcelona Supercomputing Center, as the owner and creator of the model, shall not be held liable for any outcomes resulting from third-party use.
|
614 |
|
615 |
### Citation
|
616 |
-
|
617 |
-
|
618 |
-
@article{salamandra,
|
619 |
-
title={Salamandra Technical Report},
|
620 |
-
author={LangTech@BSC},
|
621 |
-
year={2024},
|
622 |
-
url = {}
|
623 |
-
}
|
624 |
-
```
|
625 |
|
626 |
### License
|
627 |
[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
|
|
541 |
|
542 |
</details>
|
543 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
544 |
|
545 |
## Evaluation
|
546 |
|
|
|
573 |
|
574 |
### Acknowledgements
|
575 |
|
|
|
|
|
|
|
|
|
|
|
576 |
|
577 |
+
This project has benefited from the contributions of numerous teams and institutions, mainly through data contributions, knowledge transfer or technical support.
|
578 |
+
|
579 |
+
In Catalonia, many institutions have been involved in the project. Our thanks to Òmnium Cultural, Parlament de Catalunya, Institut d'Estudis Aranesos, Racó Català, Vilaweb, ACN, Nació Digital, El món and Aquí Berguedà.
|
580 |
+
|
581 |
+
At national level, we are especially grateful to our ILENIA project partners: CENID, HiTZ and CiTIUS for their participation. We also extend our genuine gratitude to the Spanish Senate and Congress, Fundación Dialnet, Fundación Elcano and the ‘Instituto Universitario de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería (SIANI)’ of the University of Las Palmas de Gran Canaria.
|
582 |
+
|
583 |
+
At the international level, we thank the Welsh government, DFKI, Occiglot project, especially Malte Ostendorff, and The Common Crawl Foundation, especially Pedro Ortiz, for their collaboration. We would also like to give special thanks to the NVIDIA team, with whom we have met regularly, specially to: Ignacio Sarasua, Adam Henryk Grzywaczewski, Oleg Sudakov, Sergio Perez, Miguel Martinez, Felipes Soares and Meriem Bendris. Their constant support has been especially appreciated throughout the entire process.
|
584 |
+
|
585 |
+
Their valuable efforts have been instrumental in the development of this work.
|
586 |
|
587 |
### Disclaimer
|
588 |
Be aware that the model may contain biases or other unintended distortions.
|
|
|
593 |
The Barcelona Supercomputing Center, as the owner and creator of the model, shall not be held liable for any outcomes resulting from third-party use.
|
594 |
|
595 |
### Citation
|
596 |
+
|
597 |
+
Technical report and paper coming soon.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
598 |
|
599 |
### License
|
600 |
[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|