license: apache-2.0 | |
datasets: | |
- vietgpt/wikipedia_vi | |
- oscar-corpus/OSCAR-2301 | |
language: | |
- vi | |
- en | |
pipeline_tag: text-generation | |
# Concept of open-llama-7b-vi | |
This is a OpenLLama model finetuned on texts in the Vietnamese language. | |
## Model architecture | |
The model architecture is the same as the original OpenLLama model | |
## Training Data | |
The models are trained on the Vietnamese version of Wikipedia. | |
The generated corpus files are 1.5GB in total, containing approximately 1.3M sentences. |