dayyass
/

universal-sentence-encoder-multilingual-large-3-pytorch

Feature Extraction

sentence-transformers

Sentence Transformers

sentence-similarity

arxiv:1803.11175

arxiv:1907.04307

Inference Endpoints

Model card Files Files and versions Community

dayyass commited on May 22

Commit

891847a

•

1 Parent(s): 4ca70aa

Update README.md

Files changed (1) hide show

README.md +43 -3

README.md CHANGED Viewed

@@ -1,3 +1,43 @@
----
-license: apache-2.0
----

+---
+license: mit
+---
+# Convert MUSE from TensorFlow to PyTorch and ONNX
+Read more about the project: [GitHub](https://github.com/dayyass/muse_tf2pt/tree/main).
+> [!IMPORTANT]
+> **The PyTorch model can be used not only for inference, but also for additional training and fine-tuning**.
+# Usage
+The model is available in [HF Models](https://huggingface.co/dayyass/universal-sentence-encoder-multilingual-large-3-pytorch/tree/main) directly through `torch` (*currently, without native support from the `transformers` library*).
+Model initialization and usage code:
+```python
+import torch
+from functools import partial
+from src.architecture import MUSE
+from src.tokenizer import get_tokenizer, tokenize
+PATH_TO_PT_MODEL = "model.pt"
+PATH_TO_TF_MODEL = "universal-sentence-encoder-multilingual-large-3"
+tokenizer = get_tokenizer(PATH_TO_TF_MODEL)
+tokenize = partial(tokenize, tokenizer=tokenizer)
+model_torch = MUSE(
+    num_embeddings=128010,
+    embedding_dim=512,
+    d_model=512,
+    num_heads=8,
+)
+model_torch.load_state_dict(
+    torch.load(PATH_TO_PT_MODEL)
+)
+sentence = "Hello, world!"
+res = model_torch(tokenize(sentence))
+```
+> [!NOTE]
+> Currently, the checkpoint of the original TF Hub model is used for tokenization, so it is loaded in the code above.