hkeshhk commited on
Commit
24a5bc1
1 Parent(s): f6d5ab6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -1,8 +1,14 @@
 
 
 
 
 
1
  # bpetokenizer
2
 
3
  A Byte Pair Encoding (BPE) tokenizer, which algorithmically follows along the GPT tokenizer. The tokenizer is capable of handling special tokens and uses a customizable regex pattern for tokenization(includes the gpt4 regex pattern). supports `save` and `load` tokenizers in the `json` and `file` format.
4
 
5
 
 
6
  ### Overview
7
 
8
  The Byte Pair Encoding (BPE) algorithm is a simple yet powerful method for building a vocabulary of subword units for a given text corpus. This tokenizer can be used for training your tokenizer of the LLM on various languages of text corpus.
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ ---
6
  # bpetokenizer
7
 
8
  A Byte Pair Encoding (BPE) tokenizer, which algorithmically follows along the GPT tokenizer. The tokenizer is capable of handling special tokens and uses a customizable regex pattern for tokenization(includes the gpt4 regex pattern). supports `save` and `load` tokenizers in the `json` and `file` format.
9
 
10
 
11
+
12
  ### Overview
13
 
14
  The Byte Pair Encoding (BPE) algorithm is a simple yet powerful method for building a vocabulary of subword units for a given text corpus. This tokenizer can be used for training your tokenizer of the LLM on various languages of text corpus.