javier-ab-bsc commited on
Commit
4ce4dd5
1 Parent(s): 72dc82e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -38
README.md CHANGED
@@ -541,28 +541,6 @@ The dataset does not allow for external contributions.
541
 
542
  </details>
543
 
544
- ### Finetuning Data
545
-
546
- This instruction-tuned variant has been trained with a mixture of 276k English, Spanish, and Catalan multi-turn instructions gathered from open datasets:
547
- | Dataset | ca | en | es |
548
- |-----------------------|:------:|:------:|:------:|
549
- | alpaca-cleaned | - | 50,000 | - |
550
- | aya-dataset | - | 3,944 | 3,854 |
551
- | CoQCat | 4,797 | - | - |
552
- | databricks-dolly-15k | - | 15,011 | - |
553
- | dolly-3k-ca | 3,232 | - | - |
554
- | flores-instr | 1,994 | 1,994 | 3,988 |
555
- | MentorCA | 7,122 | - | - |
556
- | MentorES | - | - | 7,122 |
557
- | no-robots | - | 9,499 | - |
558
- | oasst-ca | 2,518 | - | - |
559
- | oasst2 | 750 | 31,086 | 15,438 |
560
- | open-orca | - | 50,000 | - |
561
- | RagMultilingual | 16,043 | 14,997 | 11,263 |
562
- | tower-blocks | - | 19,895 | 2,000 |
563
- | **Total** | **36,456** | **196,426** | **43,665** |
564
-
565
- ---
566
 
567
  ## Evaluation
568
 
@@ -595,14 +573,16 @@ within the framework of [ILENIA Project](https://proyectoilenia.es/) with refere
595
 
596
  ### Acknowledgements
597
 
598
- This project benefited from the contributions of many teams and institutions, including:
599
- Senado de España, Parlament de Catalunya, Òmnium Cultural, Dialnet, Institut d’Estudis Aranesos,
600
- Fundación Elcano, Universidad de Las Palmas de Gran Canaria, Occiglot, Common Crawl, the Welsh Government,
601
- the German Research Center for Artificial Intelligence (DFKI) and the partners of Proyecto ILENIA.
602
- Their valuable efforts have been instrumental in the development of this work.
603
 
604
- A special acknowledgment is reserved for the NVIDIA Team with whom we have been meeting on a regular basis.
605
- Their consistent support has been particularly appreciated throughout the process.
 
 
 
 
 
 
 
606
 
607
  ### Disclaimer
608
  Be aware that the model may contain biases or other unintended distortions.
@@ -613,15 +593,8 @@ including those governing the use of Artificial Intelligence.
613
  The Barcelona Supercomputing Center, as the owner and creator of the model, shall not be held liable for any outcomes resulting from third-party use.
614
 
615
  ### Citation
616
- <span style="color:red">Work in progress, paper coming soon.</span>
617
- ```bibtext
618
- @article{salamandra,
619
- title={Salamandra Technical Report},
620
- author={LangTech@BSC},
621
- year={2024},
622
- url = {}
623
- }
624
- ```
625
 
626
  ### License
627
  [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 
541
 
542
  </details>
543
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
544
 
545
  ## Evaluation
546
 
 
573
 
574
  ### Acknowledgements
575
 
 
 
 
 
 
576
 
577
+ This project has benefited from the contributions of numerous teams and institutions, mainly through data contributions, knowledge transfer or technical support.
578
+
579
+ In Catalonia, many institutions have been involved in the project. Our thanks to Òmnium Cultural, Parlament de Catalunya, Institut d'Estudis Aranesos, Racó Català, Vilaweb, ACN, Nació Digital, El món and Aquí Berguedà.
580
+
581
+ At national level, we are especially grateful to our ILENIA project partners: CENID, HiTZ and CiTIUS for their participation. We also extend our genuine gratitude to the Spanish Senate and Congress, Fundación Dialnet, Fundación Elcano and the ‘Instituto Universitario de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería (SIANI)’ of the University of Las Palmas de Gran Canaria.
582
+
583
+ At the international level, we thank the Welsh government, DFKI, Occiglot project, especially Malte Ostendorff, and The Common Crawl Foundation, especially Pedro Ortiz, for their collaboration. We would also like to give special thanks to the NVIDIA team, with whom we have met regularly, specially to: Ignacio Sarasua, Adam Henryk Grzywaczewski, Oleg Sudakov, Sergio Perez, Miguel Martinez, Felipes Soares and Meriem Bendris. Their constant support has been especially appreciated throughout the entire process.
584
+
585
+ Their valuable efforts have been instrumental in the development of this work.
586
 
587
  ### Disclaimer
588
  Be aware that the model may contain biases or other unintended distortions.
 
593
  The Barcelona Supercomputing Center, as the owner and creator of the model, shall not be held liable for any outcomes resulting from third-party use.
594
 
595
  ### Citation
596
+
597
+ Technical report and paper coming soon.
 
 
 
 
 
 
 
598
 
599
  ### License
600
  [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)