mpasila
/

Finnish-Alpaca-Small-7B

@@ -19,6 +19,8 @@ LoRA trained in 4-bit with 2k context using [LumiOpen/Viking-7B](https://hugging
 Dataset used is [mpasila/Finnish-Alpaca-Small](https://huggingface.co/datasets/mpasila/Finnish-Alpaca-Small).
 ### Prompt format: Alpaca
 It uses Alpaca format but with a translated instruction at the start:
 ```
@@ -32,8 +34,8 @@ It uses Alpaca format but with a translated instruction at the start:
 | Model | Size | Type | FIN-bench (score) |
 |-------|------|------|-------|
-| **mpasila/Finnish-Alpaca-Small-7B** | 7B | Instruct |  |
-| [mpasila/Finnish-Alpaca-Tiny-V2-7B](https://huggingface.co/mpasila/Finnish-Alpaca-Tiny-V2-7B) | 7B | Instruct | 0.4654 |
 | [mpasila/Alpacazord-Viking-7B](https://huggingface.co/mpasila/Alpacazord-Viking-7B) | 7B | Instruct | 0.4123 |
 | [mpasila/NordicAlpaca-Finnish-V1-7B](https://huggingface.co/mpasila/NordicAlpaca-Finnish-V1-7B) | 7B | Instruct | 0.3891 |
 | [mpasila/Finnish-Viking-Alpaca-V1-7B](https://huggingface.co/mpasila/Finnish-Viking-Alpaca-V1-7B) | 7B | Instruct | 0.3943 |
@@ -47,6 +49,44 @@ It uses Alpaca format but with a translated instruction at the start:
 #### FIN-bench scores:
 # Uploaded  model

 Dataset used is [mpasila/Finnish-Alpaca-Small](https://huggingface.co/datasets/mpasila/Finnish-Alpaca-Small).
+Re-trained because I have no idea if I used the fully trained model or the partially trained model (of Viking-7B), since it apparently was just released. (After re-training the score lowered noticeably so I wonder if I screwed up something.)
 ### Prompt format: Alpaca
 It uses Alpaca format but with a translated instruction at the start:
 ```
 | Model | Size | Type | FIN-bench (score) |
 |-------|------|------|-------|
+| **mpasila/Finnish-Alpaca-Small-7B** | 7B | Instruct | 0.3586 |
+| [mpasila/Finnish-Alpaca-Tiny-V2-7B](https://huggingface.co/mpasila/Finnish-Alpaca-Tiny-V2-7B) | 7B | Instruct | **0.4654** |
 | [mpasila/Alpacazord-Viking-7B](https://huggingface.co/mpasila/Alpacazord-Viking-7B) | 7B | Instruct | 0.4123 |
 | [mpasila/NordicAlpaca-Finnish-V1-7B](https://huggingface.co/mpasila/NordicAlpaca-Finnish-V1-7B) | 7B | Instruct | 0.3891 |
 | [mpasila/Finnish-Viking-Alpaca-V1-7B](https://huggingface.co/mpasila/Finnish-Viking-Alpaca-V1-7B) | 7B | Instruct | 0.3943 |
 #### FIN-bench scores:
+|                      Task                      |Version|       Metric        |Value |   |Stderr|
+|------------------------------------------------|------:|---------------------|-----:|---|-----:|
+|bigbench_analogies                              |      0|multiple_choice_grade|0.5923|±  |0.0433|
+|bigbench_arithmetic_1_digit_addition            |      0|multiple_choice_grade|0.2700|±  |0.0446|
+|bigbench_arithmetic_1_digit_division            |      0|multiple_choice_grade|0.4783|±  |0.1065|
+|bigbench_arithmetic_1_digit_multiplication      |      0|multiple_choice_grade|0.2600|±  |0.0441|
+|bigbench_arithmetic_1_digit_subtraction         |      0|multiple_choice_grade|0.2200|±  |0.0416|
+|bigbench_arithmetic_2_digit_addition            |      0|multiple_choice_grade|0.1700|±  |0.0378|
+|bigbench_arithmetic_2_digit_division            |      0|multiple_choice_grade|0.3600|±  |0.0482|
+|bigbench_arithmetic_2_digit_multiplication      |      0|multiple_choice_grade|0.2000|±  |0.0402|
+|bigbench_arithmetic_2_digit_subtraction         |      0|multiple_choice_grade|0.1300|±  |0.0338|
+|bigbench_arithmetic_3_digit_addition            |      0|multiple_choice_grade|0.3100|±  |0.0465|
+|bigbench_arithmetic_3_digit_division            |      0|multiple_choice_grade|0.2100|±  |0.0409|
+|bigbench_arithmetic_3_digit_multiplication      |      0|multiple_choice_grade|0.1600|±  |0.0368|
+|bigbench_arithmetic_3_digit_subtraction         |      0|multiple_choice_grade|0.2300|±  |0.0423|
+|bigbench_arithmetic_4_digit_addition            |      0|multiple_choice_grade|0.3900|±  |0.0490|
+|bigbench_arithmetic_4_digit_division            |      0|multiple_choice_grade|0.2300|±  |0.0423|
+|bigbench_arithmetic_4_digit_multiplication      |      0|multiple_choice_grade|0.2100|±  |0.0409|
+|bigbench_arithmetic_4_digit_subtraction         |      0|multiple_choice_grade|0.4500|±  |0.0500|
+|bigbench_arithmetic_5_digit_addition            |      0|multiple_choice_grade|0.4800|±  |0.0502|
+|bigbench_arithmetic_5_digit_division            |      0|multiple_choice_grade|0.0700|±  |0.0256|
+|bigbench_arithmetic_5_digit_multiplication      |      0|multiple_choice_grade|0.1700|±  |0.0378|
+|bigbench_arithmetic_5_digit_subtraction         |      0|multiple_choice_grade|0.5800|±  |0.0496|
+|bigbench_cause_and_effect_one_sentence          |      0|multiple_choice_grade|0.6275|±  |0.0684|
+|bigbench_cause_and_effect_one_sentence_no_prompt|      0|multiple_choice_grade|0.6667|±  |0.0667|
+|bigbench_cause_and_effect_two_sentences         |      0|multiple_choice_grade|0.5098|±  |0.0707|
+|bigbench_emotions                               |      0|multiple_choice_grade|0.3312|±  |0.0373|
+|bigbench_empirical_judgments                    |      0|multiple_choice_grade|0.3333|±  |0.0476|
+|bigbench_general_knowledge                      |      0|multiple_choice_grade|0.2857|±  |0.0544|
+|bigbench_hhh_alignment_harmless                 |      0|multiple_choice_grade|0.3793|±  |0.0643|
+|bigbench_hhh_alignment_helpful                  |      0|multiple_choice_grade|0.3559|±  |0.0629|
+|bigbench_hhh_alignment_honest                   |      0|multiple_choice_grade|0.3559|±  |0.0629|
+|bigbench_hhh_alignment_other                    |      0|multiple_choice_grade|0.5349|±  |0.0770|
+|bigbench_intent_recognition                     |      0|multiple_choice_grade|0.1546|±  |0.0138|
+|bigbench_misconceptions                         |      0|multiple_choice_grade|0.5448|±  |0.0432|
+|bigbench_paraphrase                             |      0|multiple_choice_grade|0.5300|±  |0.0354|
+|bigbench_sentence_ambiguity                     |      0|multiple_choice_grade|0.4333|±  |0.0645|
+|bigbench_similarities_abstraction               |      0|multiple_choice_grade|0.6974|±  |0.0530|
 # Uploaded  model