SeungAhSon's picture
Update README.md
0898a84 verified
|
raw
history blame
4.02 kB
metadata
license: apache-2.0
language:
  - ko
tags:
  - deberta-v3

deberta-v3-base-korean

Model Details

DeBERTa는 Disentangled Attention과 Enhanced Masked Language Model을 통해 BERT의 성능을 향상시킨 모델입니다. 그중 DeBERTa V3은 ELECTRA-Style Pre-Training에 Gradient-Disentangled Embedding Sharing을 적용하여 DeBERTA를 개선했습니다.

이 연구는 구글의 TPU Research Cloud(TRC)를 통해 지원받은 Cloud TPU로 학습되었습니다.

How to Get Started with the Model

from transformers import AutoTokenizer, DebertaV2ForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("team-lucid/deberta-v3-base-korean")
model = DebertaV2ForSequenceClassification.from_pretrained("team-lucid/deberta-v3-base-korean")

inputs = tokenizer("안녕, 세상!", return_tensors="pt")
outputs = model(**inputs)

Evaluation

Backbone
Parameters(M)
NSMC
(acc)
PAWS
(acc)
KorNLI
(acc)
KorSTS
(spearman)
Question Pair
(acc)
DistilKoBERT 22M 88.41 62.55 70.55 73.21 92.48
KoBERT 85M 89.63 80.65 79.00 79.64 93.93
XLM-Roberta-Base 85M 89.49 82.95 79.92 79.09 93.53
KcBERT-Base 85M 89.62 66.95 74.85 75.57 93.93
KcBERT-Large 302M 90.68 70.15 76.99 77.49 94.06
KoELECTRA-Small-v3 9.4M 89.36 77.45 78.60 80.79 94.85
KoELECTRA-Base-v3 85M 90.63 84.45 82.24 85.53 95.25
Ours
DeBERTa-xsmall 22M 91.21 84.40 82.13 83.90 95.38
DeBERTa-small 43M 91.34 83.90 81.61 82.97 94.98
DeBERTa-base 86M 91.22 85.5 82.81 84.46 95.77

* 다른 모델의 결과는 KcBERT-FinetuneKoELECTRA를 참고했으며, Hyperparameter 역시 다른 모델과 유사하게 설정습니다.

Model Memory Requirements

dtype Largest Layer or Residual Group Total Size Training using Adam
float32 187.79 MB 513.77 MB 2.01 GB
float16/bfloat16 93.9 MB 256.88 MB 1.0 GB
int8 46.95 MB 128.44 MB 513.77 MB
int4 23.47 MB 64.22 MB 256.88 MB