ccasimiro commited on
Commit
fe06035
1 Parent(s): a3c852d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -1
README.md CHANGED
@@ -46,7 +46,31 @@ F1 Score: 0.8340
46
  For evaluation details visit our [GitHub repository](https://github.com/PlanTL-GOB-ES/lm-biomedical-clinical-es).
47
 
48
  ## Citing
49
- To be announced soon!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
  ## Funding
52
  This work was partially funded by the Spanish State Secretariat for Digitalization and Artificial Intelligence (SEDIA) within the framework of the Plan-TL, and the Future of Computing Center, a Barcelona Supercomputing Center and IBM initiative (2020).
 
46
  For evaluation details visit our [GitHub repository](https://github.com/PlanTL-GOB-ES/lm-biomedical-clinical-es).
47
 
48
  ## Citing
49
+ If you use these models, please cite our work:
50
+
51
+ ```bibtext
52
+ @inproceedings{carrino-etal-2022-pretrained,
53
+ title = "Pretrained Biomedical Language Models for Clinical {NLP} in {S}panish",
54
+ author = "Carrino, Casimiro Pio and
55
+ Llop, Joan and
56
+ P{\`a}mies, Marc and
57
+ Guti{\'e}rrez-Fandi{\~n}o, Asier and
58
+ Armengol-Estap{\'e}, Jordi and
59
+ Silveira-Ocampo, Joaqu{\'\i}n and
60
+ Valencia, Alfonso and
61
+ Gonzalez-Agirre, Aitor and
62
+ Villegas, Marta",
63
+ booktitle = "Proceedings of the 21st Workshop on Biomedical Language Processing",
64
+ month = may,
65
+ year = "2022",
66
+ address = "Dublin, Ireland",
67
+ publisher = "Association for Computational Linguistics",
68
+ url = "https://aclanthology.org/2022.bionlp-1.19",
69
+ doi = "10.18653/v1/2022.bionlp-1.19",
70
+ pages = "193--199",
71
+ abstract = "This work presents the first large-scale biomedical Spanish language models trained from scratch, using large biomedical corpora consisting of a total of 1.1B tokens and an EHR corpus of 95M tokens. We compared them against general-domain and other domain-specific models for Spanish on three clinical NER tasks. As main results, our models are superior across the NER tasks, rendering them more convenient for clinical NLP applications. Furthermore, our findings indicate that when enough data is available, pre-training from scratch is better than continual pre-training when tested on clinical tasks, raising an exciting research question about which approach is optimal. Our models and fine-tuning scripts are publicly available at HuggingFace and GitHub.",
72
+ }
73
+ ```
74
 
75
  ## Funding
76
  This work was partially funded by the Spanish State Secretariat for Digitalization and Artificial Intelligence (SEDIA) within the framework of the Plan-TL, and the Future of Computing Center, a Barcelona Supercomputing Center and IBM initiative (2020).