update readme
Browse files
README.md
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: ko
|
3 |
+
---
|
4 |
+
|
5 |
+
# KoBigBird
|
6 |
+
|
7 |
+
Pretrained BigBird Model for Korean (**kobigbird-bert-base**)
|
8 |
+
|
9 |
+
## About
|
10 |
+
|
11 |
+
BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences.
|
12 |
+
|
13 |
+
BigBird relies on **block sparse attention** instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT. It has achieved SOTA on various tasks involving very long sequences such as long documents summarization, question-answering with long contexts.
|
14 |
+
|
15 |
+
Model is warm started from Korean BERT’s checkpoint.
|
16 |
+
|
17 |
+
## How to use
|
18 |
+
|
19 |
+
WARN: Please use `BertTokenizer` instead of `BigBirdTokenizer`.
|
20 |
+
|
21 |
+
```python
|
22 |
+
from transformers import AutoModel, AutoTokenizer
|
23 |
+
|
24 |
+
# by default its in `block_sparse` mode with num_random_blocks=3, block_size=64
|
25 |
+
model = AutoModel.from_pretrained("monologg/kobigbird-bert-base")
|
26 |
+
|
27 |
+
# you can change `attention_type` to full attention like this:
|
28 |
+
model = AutoModel.from_pretrained("monologg/kobigbird-bert-base", attention_type="original_full")
|
29 |
+
|
30 |
+
# you can change `block_size` & `num_random_blocks` like this:
|
31 |
+
model = AutoModel.from_pretrained("monologg/kobigbird-bert-base", block_size=16, num_random_blocks=2)
|
32 |
+
|
33 |
+
tokenizer = AutoTokenizer.from_pretrained("monologg/kobigbird-bert-base")
|
34 |
+
text = "한국어 BigBird 모델을 공개합니다!"
|
35 |
+
encoded_input = tokenizer(text, return_tensors='pt')
|
36 |
+
output = model(**encoded_input)
|
37 |
+
```
|