Word2vec
/

polyglot_words_embeddings_sah

Model card Files Files and versions Community

lbourdois commited on May 28, 2023

Commit

1508ca5

•

1 Parent(s): c2f3c82

Create README.md

Files changed (1) hide show

README.md +49 -0

README.md ADDED Viewed

	@@ -0,0 +1,49 @@

+---
+tags:
+  - word2vec
+language: sah
+license: gpl-3.0
+---
+## Description
+Word embedding model trained by Al-Rfou et al.
+## How to use?
+```
+import pickle
+from numpy import dot
+from numpy.linalg import norm
+from huggingface_hub import hf_hub_download
+words, embeddings = pickle.load(open(hf_hub_download(repo_id="Word2vec/polyglot_words_embeddings_en", filename="words_embeddings_en.pkl"), 'rb'),encoding="latin1")
+word = "Irish"
+a = embeddings[words.index(word)]
+most_similar = []
+for i in range(len(embeddings)):
+    if i != words.index(word):
+        b = embeddings[i]
+        cos_sim = dot(a, b)/(norm(a)*norm(b))
+        most_similar.append(cos_sim)
+    else:
+        most_similar.append(0)
+words[most_similar.index(max(most_similar))]
+```
+## Citation
+```
+@InProceedings{polyglot:2013:ACL-CoNLL,
+ author    = {Al-Rfou, Rami  and  Perozzi, Bryan  and  Skiena, Steven},
+ title     = {Polyglot: Distributed Word Representations for Multilingual NLP},
+ booktitle = {Proceedings of the Seventeenth Conference on Computational Natural Language Learning},
+ month     = {August},
+ year      = {2013},
+ address   = {Sofia, Bulgaria},
+ publisher = {Association for Computational Linguistics},
+ pages     = {183--192},
+ url       = {http://www.aclweb.org/anthology/W13-3520}
+}
+```