Transformers
Inference Endpoints
ancatmara commited on
Commit
4ba6318
1 Parent(s): 7d3a237

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -12,6 +12,8 @@ tags:
12
 
13
  [SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing (Kudo et al., 2018)](https://arxiv.org/pdf/1808.06226.pdf) treats the input as a raw input stream, thus including the space in the set of characters to use. It then uses the BPE or unigram algorithm to construct the appropriate vocabulary. It helps process languages that don't separate words. All transformer models in the `transformers` library that use SentencePiece use it in combination with unigram. Examples of models using SentencePiece are [ALBERT](https://huggingface.co/docs/transformers/en/model_doc/albert), [XLNet](https://huggingface.co/docs/transformers/en/model_doc/xlnet), [Marian](https://huggingface.co/docs/transformers/en/model_doc/marian), and [T5](https://huggingface.co/docs/transformers/en/model_doc/t5).
14
 
 
 
15
  ### Use
16
 
17
  ```python
 
12
 
13
  [SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing (Kudo et al., 2018)](https://arxiv.org/pdf/1808.06226.pdf) treats the input as a raw input stream, thus including the space in the set of characters to use. It then uses the BPE or unigram algorithm to construct the appropriate vocabulary. It helps process languages that don't separate words. All transformer models in the `transformers` library that use SentencePiece use it in combination with unigram. Examples of models using SentencePiece are [ALBERT](https://huggingface.co/docs/transformers/en/model_doc/albert), [XLNet](https://huggingface.co/docs/transformers/en/model_doc/xlnet), [Marian](https://huggingface.co/docs/transformers/en/model_doc/marian), and [T5](https://huggingface.co/docs/transformers/en/model_doc/t5).
14
 
15
+ This tokenizer was trained with `vocab_size=25000` and `min_frequency=2`.
16
+
17
  ### Use
18
 
19
  ```python