|
--- |
|
library_name: transformers |
|
datasets: |
|
- argilla/distilabel-capybara-dpo-7k-binarized |
|
--- |
|
# CapyLake-7B-v2-laser |
|
|
|
This model is a finetune of [cognitivecomputations/WestLake-7B-v2-Laser](https://huggingface.co/cognitivecomputations/WestLake-7B-v2-laser) on [argilla/distilabel-capybara-dpo-7k-binarized](https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized) |
|
|
|
<div align="center"> |
|
|
|
![image/webp](https://cdn-uploads.huggingface.co/production/uploads/6455cc8d679315e4ef16fbec/kx2uwS_kZ-rTAJiusSrAW.webp) |
|
|
|
[<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-dark.png" alt="Built with Distilabel" width="200" height="32"/>](https://github.com/argilla-io/distilabel) |
|
|
|
</div> |
|
|
|
## Process |
|
|
|
+ Realigned the chat template to ChatML |
|
+ Completed 1 Epoch |
|
+ 5e-05 learning rate |
|
+ Training time was about 2 hours on 1 H100 |
|
+ Cost was ~$8 |
|
|
|
## Code Example |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_id = "macadeliccc/CapyLake-7B-v2-laser" |
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
model = AutoModelForCausalLM.from_pretrained(model_id) |
|
|
|
text = "Create an idea for a TV show and write a short pilot script" |
|
inputs = tokenizer(text, return_tensors="pt") |
|
|
|
# Adding hyperparameters to the generation call |
|
outputs = model.generate( |
|
**inputs, |
|
max_new_tokens=4096, # Controls the maximum length of the new tokens created |
|
temperature=0.7, # Adjust for creativity (lower is less random) |
|
top_k=50, # Keeps the top k tokens for sampling |
|
top_p=0.95, # Uses nucleus sampling with this cumulative probability |
|
num_return_sequences=1, # Number of sequences to generate |
|
no_repeat_ngram_size=2, # Prevents repeating n-grams to ensure diversity |
|
early_stopping=True # Stops generation when all sequences reach the EOS token |
|
) |
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
## Other Capy Models |
|
|
|
SOLAR-10.7B-Capy-v1.0 is also on the way. There could be more depending on performance! |
|
|
|
## Evaluations |
|
|
|
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average| |
|
|-------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:| |
|
|[CapyLake-7B-v2-laser](https://huggingface.co/macadeliccc/CapyLake-7B-v2-laser)| 44.34| 77.77| 68.47| 47.92| 59.62| |
|
|
|
### AGIEval |
|
| Task |Version| Metric |Value| |Stderr| |
|
|------------------------------|------:|--------|----:|---|-----:| |
|
|agieval_aqua_rat | 0|acc |28.35|± | 2.83| |
|
| | |acc_norm|25.98|± | 2.76| |
|
|agieval_logiqa_en | 0|acc |38.86|± | 1.91| |
|
| | |acc_norm|39.02|± | 1.91| |
|
|agieval_lsat_ar | 0|acc |25.22|± | 2.87| |
|
| | |acc_norm|24.35|± | 2.84| |
|
|agieval_lsat_lr | 0|acc |50.39|± | 2.22| |
|
| | |acc_norm|51.57|± | 2.22| |
|
|agieval_lsat_rc | 0|acc |65.06|± | 2.91| |
|
| | |acc_norm|63.94|± | 2.93| |
|
|agieval_sat_en | 0|acc |78.64|± | 2.86| |
|
| | |acc_norm|78.64|± | 2.86| |
|
|agieval_sat_en_without_passage| 0|acc |40.78|± | 3.43| |
|
| | |acc_norm|40.78|± | 3.43| |
|
|agieval_sat_math | 0|acc |33.64|± | 3.19| |
|
| | |acc_norm|30.45|± | 3.11| |
|
|
|
Average: 44.34% |
|
|
|
### GPT4All |
|
| Task |Version| Metric |Value| |Stderr| |
|
|-------------|------:|--------|----:|---|-----:| |
|
|arc_challenge| 0|acc |66.89|± | 1.38| |
|
| | |acc_norm|67.49|± | 1.37| |
|
|arc_easy | 0|acc |86.70|± | 0.70| |
|
| | |acc_norm|81.90|± | 0.79| |
|
|boolq | 1|acc |88.10|± | 0.57| |
|
|hellaswag | 0|acc |71.45|± | 0.45| |
|
| | |acc_norm|87.78|± | 0.33| |
|
|openbookqa | 0|acc |39.80|± | 2.19| |
|
| | |acc_norm|49.80|± | 2.24| |
|
|piqa | 0|acc |82.86|± | 0.88| |
|
| | |acc_norm|84.87|± | 0.84| |
|
|winogrande | 0|acc |84.45|± | 1.02| |
|
|
|
Average: 77.77% |
|
|
|
### TruthfulQA |
|
| Task |Version|Metric|Value| |Stderr| |
|
|-------------|------:|------|----:|---|-----:| |
|
|truthfulqa_mc| 1|mc1 |53.98|± | 1.74| |
|
| | |mc2 |68.47|± | 1.53| |
|
|
|
Average: 68.47% |
|
|
|
### Bigbench |
|
|
|
| Task |Version| Metric |Value| |Stderr| |
|
|------------------------------------------------|------:|---------------------|----:|---|-----:| |
|
|bigbench_causal_judgement | 0|multiple_choice_grade|59.47|± | 3.57| |
|
|bigbench_date_understanding | 0|multiple_choice_grade|64.50|± | 2.49| |
|
|bigbench_disambiguation_qa | 0|multiple_choice_grade|44.96|± | 3.10| |
|
|bigbench_geometric_shapes | 0|multiple_choice_grade|22.84|± | 2.22| |
|
| | |exact_str_match | 2.79|± | 0.87| |
|
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|30.80|± | 2.07| |
|
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|21.57|± | 1.56| |
|
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|56.67|± | 2.87| |
|
|bigbench_movie_recommendation | 0|multiple_choice_grade|51.60|± | 2.24| |
|
|bigbench_navigate | 0|multiple_choice_grade|51.00|± | 1.58| |
|
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|70.35|± | 1.02| |
|
|bigbench_ruin_names | 0|multiple_choice_grade|51.79|± | 2.36| |
|
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|35.97|± | 1.52| |
|
|bigbench_snarks | 0|multiple_choice_grade|79.01|± | 3.04| |
|
|bigbench_sports_understanding | 0|multiple_choice_grade|75.66|± | 1.37| |
|
|bigbench_temporal_sequences | 0|multiple_choice_grade|47.90|± | 1.58| |
|
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|23.84|± | 1.21| |
|
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|18.00|± | 0.92| |
|
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|56.67|± | 2.87| |
|
|
|
Average: 47.92% |
|
|
|
Average score: 59.62% |
|
|
|
Elapsed time: 01:57:56 |