cecilemacaire
commited on
Commit
•
9725eed
1
Parent(s):
350c797
Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,7 @@ example_title : "A simple sentence"
|
|
18 |
|
19 |
# t2p-t5-large-orféo
|
20 |
|
21 |
-
|
22 |
|
23 |
## Training details
|
24 |
|
@@ -37,6 +37,25 @@ T2P-t5-large-orféo is a text-to-pictograms translation model built by fine-tuni
|
|
37 |
## Using t2p-t5-large-orféo model with HuggingFace transformers
|
38 |
|
39 |
```python
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
```
|
41 |
|
42 |
- **Language(s):** French
|
|
|
18 |
|
19 |
# t2p-t5-large-orféo
|
20 |
|
21 |
+
*t2p-t5-large-orféo* is a text-to-pictograms translation model built by fine-tuning the [t5-large](https://huggingface.co/google-t5/t5-large) model on a dataset of pairs of transcriptions / pictogram token sequence (each token is linked to a pictogram image from [ARASAAC](https://arasaac.org/)).
|
22 |
|
23 |
## Training details
|
24 |
|
|
|
37 |
## Using t2p-t5-large-orféo model with HuggingFace transformers
|
38 |
|
39 |
```python
|
40 |
+
import torch
|
41 |
+
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
42 |
+
import numpy as np
|
43 |
+
|
44 |
+
source_lang = "fr"
|
45 |
+
target_lang = "frp"
|
46 |
+
max_input_length = 128
|
47 |
+
max_target_length = 128
|
48 |
+
|
49 |
+
def load_model(checkpoint):
|
50 |
+
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
51 |
+
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
|
52 |
+
model = model.to("cuda:0")
|
53 |
+
return tokenizer, model
|
54 |
+
|
55 |
+
def generate(sentence, tokenizer, model):
|
56 |
+
inputs = tokenizer("Je mange une pomme", return_tensors="pt").input_ids
|
57 |
+
outputs = model.generate(inputs.to("cuda:0"), max_new_tokens=40, do_sample=True, top_k=30, top_p=0.95)
|
58 |
+
pred = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
59 |
```
|
60 |
|
61 |
- **Language(s):** French
|