Dan Velasco commited on
Commit
b542d6e
1 Parent(s): 6825190

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: tl
3
+ tags:
4
+ - roberta
5
+ - tagalog
6
+ - filipino
7
+ - sentence-transformers
8
+ datasets: newsph_nli
9
+ license: cc-by-sa-4.0
10
+ ---
11
+
12
+ # Filipino Sentence RoBERTa
13
+ We finetuned [RoBERTa Tagalog Base (finetuned on COHFIE)](https://huggingface.co/danjohnvelasco/roberta-tagalog-base-cohfie-v1) on [NewsPH-NLI](https://huggingface.co/datasets/newsph_nli) to learn to encode filipino/tagalog sentences to sentence embeddings. We used [sentence-transformers](https://www.SBERT.net) to finetune the model. All model details, training setups, and corpus details can be found in this paper: [Automatic WordNet Construction using Word Sense Induction through Sentence Embeddings](https://arxiv.org/abs/2204.03251).
14
+
15
+ ## Intended uses & limitations
16
+ The intended use of this model is to extract sentence embeddings which will be used for clustering. This model may not be safe for use in production since we did not examine it for biases. Please use it with caution.
17
+
18
+ ## How to use
19
+ Using this model is easier when you have [sentence-transformers](https://www.SBERT.net) installed:
20
+ ```
21
+ pip install -U sentence-transformers
22
+ ```
23
+
24
+ Here is how to use this model to encode sentences to sentence embeddings using `SentenceTransformer`:
25
+ ```python
26
+ from sentence_transformers import SentenceTransformer
27
+
28
+ model = SentenceTransformer("danjohnvelasco/filipino-sentence-roberta-v1")
29
+ sentence_list = ["sentence 1", "sentence 2", "sentence 3"]
30
+ sentence_embeddings = model.encode(sentence_list)
31
+ print(sentence_embeddings)
32
+ ```
33
+
34
+ ## BibTeX entry and citation info
35
+ If you use this model, please cite our work:
36
+
37
+ ```
38
+ @misc{https://doi.org/10.48550/arxiv.2204.03251,
39
+ doi = {10.48550/ARXIV.2204.03251},
40
+ url = {https://arxiv.org/abs/2204.03251},
41
+ author = {Velasco, Dan John and Alba, Axel and Pelagio, Trisha Gail and Ramirez, Bryce Anthony and Cruz, Jan Christian Blaise and Cheng, Charibeth},
42
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
43
+ title = {Automatic WordNet Construction using Word Sense Induction through Sentence Embeddings},
44
+ publisher = {arXiv},
45
+ year = {2022},
46
+ copyright = {Creative Commons Attribution 4.0 International}
47
+ }
48
+ ```