s-mizuki-nlp commited on
Commit
e84dea0
1 Parent(s): 77483b4

edited introduction for SFT datasets

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -13,8 +13,8 @@ model_type: llama
13
  Llama 3.1 Swallow is a series of large language models (8B, 70B) that were built by continual pre-training on the [Meta Llama 3.1](https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f) models.
14
  Llama 3.1 Swallow enhanced the Japanese language capabilities of the original Llama 3.1 while retaining the English language capabilities.
15
  We use approximately 200 billion tokens that were sampled from a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia articles, and mathematical and
16
- coding contents, etc (see the Training Datasets section) for continual pre-training.
17
- The instruction-tuned models (Instruct) were built by supervised fine-tuning (SFT) on the synthetic data specially built for Japanese.
18
  See the Swallow Model Index section to find other model variants.
19
 
20
  # Release History
 
13
  Llama 3.1 Swallow is a series of large language models (8B, 70B) that were built by continual pre-training on the [Meta Llama 3.1](https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f) models.
14
  Llama 3.1 Swallow enhanced the Japanese language capabilities of the original Llama 3.1 while retaining the English language capabilities.
15
  We use approximately 200 billion tokens that were sampled from a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia articles, and mathematical and
16
+ coding contents, etc for continual pre-training.
17
+ The instruction-tuned models (Instruct) were built by supervised fine-tuning (SFT) on the synthetic data specially built for Japanese (see the Training Datasets section for details).
18
  See the Swallow Model Index section to find other model variants.
19
 
20
  # Release History