iqwiki-kor
/

Qwen2.5-3B-MP-RM

Text Classification

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

nlee-208 commited on 6 days ago

Commit

63f7508

•

1 Parent(s): 2c8eee5

Update README.md

Files changed (1) hide show

README.md +6 -22

README.md CHANGED Viewed

@@ -16,21 +16,15 @@ should probably proofread and complete it, then remove this comment. -->
 # Qwen2.5-3B-MP-RM
-This model is a fine-tuned version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) on the iqwiki-kor/MP-86k dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters
@@ -49,13 +43,3 @@ The following hyperparameters were used during training:
 - lr_scheduler_warmup_ratio: 0.1
 - num_epochs: 1
-### Training results
-### Framework versions
-- Transformers 4.45.1
-- Pytorch 2.4.0
-- Datasets 2.21.0
-- Tokenizers 0.20.0

 # Qwen2.5-3B-MP-RM
+This model is a fine-tuned version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) on the [iqwiki-kor/MP-86k](https://huggingface.co/datasets/iqwiki-kor/MP-86k) dataset.
+## RewardBench Evaluation
+| Model                                                                                               | Chat | Chat-Hard | Safety | Reasoning | Avg. |
+|-----------------------------------------------------------------------------------------------------|---------:|-------:|------:|--------:|--------:|
+| [iqwiki-kor/Qwen2.5-3B-MP-RM](https://huggingface.co/iqwiki-kor/Qwen2.5-3B-MP-RM)        |89.1| 75.2| 87.3| 95.4| 86.8|
+| [RLHFlow/ArmoRM-Llama3-8B-v0.1](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1)                         |96.9 |76.8 |90.5 |97.3 |90.4|
 ### Training hyperparameters
 - lr_scheduler_warmup_ratio: 0.1
 - num_epochs: 1