dahara1 commited on
Commit
baf4576
1 Parent(s): 7400549

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -67,8 +67,7 @@ Also, the score may change as a result of more tuning.
67
 
68
  * **Japanese benchmark**
69
 
70
- - *We used [Stability-AI/lm-evaluation-harness + gakada's AutoGPTQ PR](https://github.com/webbigdata-jp/lm-evaluation-harness) for evaluation.*
71
- - [Stability-AI/lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluation-harness/tree/jp-stable) + [gakada's AutoGPTQ PR](https://github.com/EleutherAI/lm-evaluation-harness/pull/519)*
72
  - *The 4-task average accuracy is based on results of JCommonsenseQA-1.1, JNLI-1.1, MARC-ja-1.1, and JSQuAD-1.1.*
73
  - *model loading is performed with gptq_use_triton=True, and evaluation is performed with template version 0.3 using the few-shot in-context learning.*
74
  - *The number of few-shots is 3,3,3,2.*
 
67
 
68
  * **Japanese benchmark**
69
 
70
+ - *We used [Stability-AI/lm-evaluation-harness + gakada's AutoGPTQ PR](https://github.com/webbigdata-jp/lm-evaluation-harness) for evaluation. ([Stability-AI/lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluation-harness/tree/jp-stable) + [gakada's AutoGPTQ PR](https://github.com/EleutherAI/lm-evaluation-harness/pull/519))*
 
71
  - *The 4-task average accuracy is based on results of JCommonsenseQA-1.1, JNLI-1.1, MARC-ja-1.1, and JSQuAD-1.1.*
72
  - *model loading is performed with gptq_use_triton=True, and evaluation is performed with template version 0.3 using the few-shot in-context learning.*
73
  - *The number of few-shots is 3,3,3,2.*