Update README.md
Browse files
README.md
CHANGED
@@ -69,11 +69,12 @@ The format for TinyCoT was:
|
|
69 |
|
70 |
| Model | Size | Data | Method | GSM8K (5-shot) | AGIEval (English/Nous subset, acc_norm) | BIG Bench Hard (CoT, few-shot*) |
|
71 |
|:-----------------------------------------------------------------------|--------|:--------------------|---------------|:---------------|:----------------------------------------|:------------------------------ |
|
72 |
-
| [StableLM 3B Base](https://hf.co/stabilityai/stablelm-3b-4e1t) | 3B | Base
|
73 |
-
| [StableHermes 3B](https://hf.co/cxllin/StableHermes-3b) | 3B | GPT | SFT | 3.64% | 24.31% | **37.28%**
|
74 |
| [MPT 7B Instruct](https://hf.co/mosaicml/mpt-7b-instruct) | **7B** | **Human**+Anthropic | SFT | 2.05% | 24.12% | 11.01% |
|
75 |
| [OpenLLaMA 7B v2 open-instruct](http://hf.co/VMware/open-llama-7b-v2-open-instruct) | **7B** | **Human** (nearly: ecqa is an exception) | SFT | 8.64% | 23.21% | 29.84% |
|
76 |
-
| [StableLM Zephyr 3B](https://hf.co/stabilityai/stablelm-zephyr-3b) | 3B | GPT | DPO | possibly contaminated (45.72%) | **33.31%**
|
|
|
77 |
| [**Memphis-CoT 3B**](https://hf.co/euclaise/Memphis-CoT-3B) | 3B | **Human** | Self-teaching | **18.8%** | *27.22%* | *36.92%* |
|
78 |
|
79 |
*5-shot, as performed automatically by LM Evaluation Harness bbh_cot_fewshot even with num_fewshot=0
|
|
|
69 |
|
70 |
| Model | Size | Data | Method | GSM8K (5-shot) | AGIEval (English/Nous subset, acc_norm) | BIG Bench Hard (CoT, few-shot*) |
|
71 |
|:-----------------------------------------------------------------------|--------|:--------------------|---------------|:---------------|:----------------------------------------|:------------------------------ |
|
72 |
+
| [StableLM 3B Base](https://hf.co/stabilityai/stablelm-3b-4e1t) | 3B | Base | Base | 2.05% | 25.14% | 36.75% |
|
73 |
+
| [StableHermes 3B](https://hf.co/cxllin/StableHermes-3b) | 3B | GPT | SFT | 3.64% | 24.31% | **37.28%** |
|
74 |
| [MPT 7B Instruct](https://hf.co/mosaicml/mpt-7b-instruct) | **7B** | **Human**+Anthropic | SFT | 2.05% | 24.12% | 11.01% |
|
75 |
| [OpenLLaMA 7B v2 open-instruct](http://hf.co/VMware/open-llama-7b-v2-open-instruct) | **7B** | **Human** (nearly: ecqa is an exception) | SFT | 8.64% | 23.21% | 29.84% |
|
76 |
+
| [StableLM Zephyr 3B](https://hf.co/stabilityai/stablelm-zephyr-3b) | 3B | GPT | DPO | possibly contaminated (45.72%) | **33.31%** | 0.91% |
|
77 |
+
| [LIMA LLaMA 2 7B](https://huggingface.co/heegyu/LIMA2-7b-hf) | **7B** | **Human** | SFT | 4.55% | 24.55% | 36.29% |
|
78 |
| [**Memphis-CoT 3B**](https://hf.co/euclaise/Memphis-CoT-3B) | 3B | **Human** | Self-teaching | **18.8%** | *27.22%* | *36.92%* |
|
79 |
|
80 |
*5-shot, as performed automatically by LM Evaluation Harness bbh_cot_fewshot even with num_fewshot=0
|