angelahzyuan
commited on
Commit
•
2076437
1
Parent(s):
0c324fb
Update README.md
Browse files
README.md
CHANGED
@@ -33,7 +33,7 @@ This model was developed using [Self-Play Preference Optimization](https://arxiv
|
|
33 |
|-------------------------------------------|:------------:|:--------:|:-----------:|
|
34 |
|[Llama-3-8B-SPPO Iter1](https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter1) |31.73 |31.74 | 1962
|
35 |
|[Llama-3-8B-SPPO Iter2](https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter2) |35.15 |35.98 | 2021
|
36 |
-
|[Llama-3-8B-SPPO Iter3](https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3)
|
37 |
|
38 |
|
39 |
|
@@ -45,7 +45,7 @@ Results are reported by using [lm-evaluation-harness](https://github.com/Eleuthe
|
|
45 |
|--------|---------------|----------------|------------|-------|-----------|-------|---------|
|
46 |
|[Llama-3-8B-SPPO Iter1](https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter1) | 63.82 | 54.96 | 76.40 | 75.44 | 79.80 | 65.65 | 69.35
|
47 |
|[Llama-3-8B-SPPO Iter2](https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter2) | 64.93 | 56.48 | 76.87 | 75.13 | 80.39 | 65.67 | 69.91
|
48 |
-
|[Llama-3-8B-SPPO Iter3](https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3) | 65.19 | 58.04 | 77.11 | 74.91 | 80.86 | 65.60 | 70.29
|
49 |
|
50 |
### Training hyperparameters
|
51 |
The following hyperparameters were used during training:
|
|
|
33 |
|-------------------------------------------|:------------:|:--------:|:-----------:|
|
34 |
|[Llama-3-8B-SPPO Iter1](https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter1) |31.73 |31.74 | 1962
|
35 |
|[Llama-3-8B-SPPO Iter2](https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter2) |35.15 |35.98 | 2021
|
36 |
+
|[Llama-3-8B-SPPO Iter3](https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3) |**38.77** |**39.85** | 2066
|
37 |
|
38 |
|
39 |
|
|
|
45 |
|--------|---------------|----------------|------------|-------|-----------|-------|---------|
|
46 |
|[Llama-3-8B-SPPO Iter1](https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter1) | 63.82 | 54.96 | 76.40 | 75.44 | 79.80 | 65.65 | 69.35
|
47 |
|[Llama-3-8B-SPPO Iter2](https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter2) | 64.93 | 56.48 | 76.87 | 75.13 | 80.39 | 65.67 | 69.91
|
48 |
+
|[Llama-3-8B-SPPO Iter3](https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3) | 65.19 | 58.04 | 77.11 | 74.91 | 80.86 | 65.60 | **70.29**
|
49 |
|
50 |
### Training hyperparameters
|
51 |
The following hyperparameters were used during training:
|