--- language: - en license: other tags: - code datasets: - reciprocate/dpo_ultra-capybara-code_filtered-best license_name: tongyi-qianwen license_link: https://huggingface.co/Qwen/Qwen1.5-7B-Chat/blob/main/LICENSE pipeline_tag: text-generation model-index: - name: Coder1.8-ORPO-TEST results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 38.82 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=raincandy-u/Coder1.8-ORPO-TEST name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 60.48 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=raincandy-u/Coder1.8-ORPO-TEST name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 46.7 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=raincandy-u/Coder1.8-ORPO-TEST name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 41.38 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=raincandy-u/Coder1.8-ORPO-TEST name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 59.75 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=raincandy-u/Coder1.8-ORPO-TEST name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 27.45 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=raincandy-u/Coder1.8-ORPO-TEST name: Open LLM Leaderboard --- # Coder1.8-ORPO-TEST ## Model Description Test model for ORPO finetune method, trained on ~20k code examples for 1 epoch on 2 x A40 cards with 4-bit QLora (lora rank=lora alpha=16). ## Disclaimer This is a test model and may generate incorrect responses. Use at your own risk. ## Train Details - Base: Qwen1.5-1.8B - Training Data: ~20k [code examples](https://huggingface.co/datasets/reciprocate/dpo_ultra-capybara-code_filtered-best) - Epochs: 1 - Method: ORPO - Hardware: 2 x A40 - Quantization: 4-bit QLora - Lora Rank/Alpha: 16 # Limitations Limited training data and quantization may impact performance. # Join the Discussion Have questions or feedback? Join our Discord server [Here](https://discord.gg/KugcbJX5). # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_raincandy-u__Coder1.8-ORPO-TEST) | Metric |Value| |---------------------------------|----:| |Avg. |45.76| |AI2 Reasoning Challenge (25-Shot)|38.82| |HellaSwag (10-Shot) |60.48| |MMLU (5-Shot) |46.70| |TruthfulQA (0-shot) |41.38| |Winogrande (5-shot) |59.75| |GSM8k (5-shot) |27.45|