xu-song's picture
update
751936e
|
raw
history blame
No virus
487 Bytes
来源:
- https://github.com/THUDM/GLM/tree/main/chinese_sentencepiece
- https://huggingface.co/THUDM/glm-10b-chinese/
## HF
```
tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-10b", trust_remote_code=True)
```
## 分词器
tokenizer_config.json
```
"AutoTokenizer": [
"tokenization_glm.GLMChineseTokenizer",
null
]
```
其中 GLMChineseTokenizer
```
https://huggingface.co/THUDM/glm-10b-chinese/blob/main/tokenization_glm.py
```
## 词典
来自