Update app.py
Browse files
app.py
CHANGED
@@ -6,7 +6,7 @@ os.environ["TOKENIZERS_PARALLELISM"] = "false"
|
|
6 |
|
7 |
article='''
|
8 |
# Spanish Nahuatl Automatic Translation
|
9 |
-
Nahuatl is the most widely spoken indigenous language in Mexico. However, training a neural network for the neural machine translation task is challenging due to the lack of structured data. The most popular datasets, such as the Axolot and bible-corpus, only consist of ~16,000 and ~7,000 samples, respectively. Moreover, there are multiple variants of Nahuatl, which makes this task even more difficult. For example, it is possible to find a single word from the Axolot dataset written in more than three different ways. Therefore, we leverage the T5 text-to-text prefix training strategy to compensate for the lack of data. We first train the multilingual model to learn Spanish and then adapt
|
10 |
|
11 |
## Motivation
|
12 |
|
|
|
6 |
|
7 |
article='''
|
8 |
# Spanish Nahuatl Automatic Translation
|
9 |
+
Nahuatl is the most widely spoken indigenous language in Mexico. However, training a neural network for the neural machine translation task is challenging due to the lack of structured data. The most popular datasets, such as the Axolot and bible-corpus, only consist of ~16,000 and ~7,000 samples, respectively. Moreover, there are multiple variants of Nahuatl, which makes this task even more difficult. For example, it is possible to find a single word from the Axolot dataset written in more than three different ways. Therefore, we leverage the T5 text-to-text prefix training strategy to compensate for the lack of data. We first train the multilingual model to learn Spanish and then adapt it to Nahuatl. The resulting T5 Transformer successfully translates short sentences. Finally, we report Chrf and BLEU results.
|
10 |
|
11 |
## Motivation
|
12 |
|