mpasila commited on
Commit
21bfddc
1 Parent(s): 83aa736

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -2
README.md CHANGED
@@ -19,6 +19,8 @@ LoRA trained in 4-bit with 2k context using [LumiOpen/Viking-7B](https://hugging
19
 
20
  Dataset used is [mpasila/Finnish-Alpaca-Small](https://huggingface.co/datasets/mpasila/Finnish-Alpaca-Small).
21
 
 
 
22
  ### Prompt format: Alpaca
23
  It uses Alpaca format but with a translated instruction at the start:
24
  ```
@@ -32,8 +34,8 @@ It uses Alpaca format but with a translated instruction at the start:
32
 
33
  | Model | Size | Type | FIN-bench (score) |
34
  |-------|------|------|-------|
35
- | **mpasila/Finnish-Alpaca-Small-7B** | 7B | Instruct | |
36
- | [mpasila/Finnish-Alpaca-Tiny-V2-7B](https://huggingface.co/mpasila/Finnish-Alpaca-Tiny-V2-7B) | 7B | Instruct | 0.4654 |
37
  | [mpasila/Alpacazord-Viking-7B](https://huggingface.co/mpasila/Alpacazord-Viking-7B) | 7B | Instruct | 0.4123 |
38
  | [mpasila/NordicAlpaca-Finnish-V1-7B](https://huggingface.co/mpasila/NordicAlpaca-Finnish-V1-7B) | 7B | Instruct | 0.3891 |
39
  | [mpasila/Finnish-Viking-Alpaca-V1-7B](https://huggingface.co/mpasila/Finnish-Viking-Alpaca-V1-7B) | 7B | Instruct | 0.3943 |
@@ -47,6 +49,44 @@ It uses Alpaca format but with a translated instruction at the start:
47
 
48
  #### FIN-bench scores:
49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
  # Uploaded model
52
 
 
19
 
20
  Dataset used is [mpasila/Finnish-Alpaca-Small](https://huggingface.co/datasets/mpasila/Finnish-Alpaca-Small).
21
 
22
+ Re-trained because I have no idea if I used the fully trained model or the partially trained model (of Viking-7B), since it apparently was just released. (After re-training the score lowered noticeably so I wonder if I screwed up something.)
23
+
24
  ### Prompt format: Alpaca
25
  It uses Alpaca format but with a translated instruction at the start:
26
  ```
 
34
 
35
  | Model | Size | Type | FIN-bench (score) |
36
  |-------|------|------|-------|
37
+ | **mpasila/Finnish-Alpaca-Small-7B** | 7B | Instruct | 0.3586 |
38
+ | [mpasila/Finnish-Alpaca-Tiny-V2-7B](https://huggingface.co/mpasila/Finnish-Alpaca-Tiny-V2-7B) | 7B | Instruct | **0.4654** |
39
  | [mpasila/Alpacazord-Viking-7B](https://huggingface.co/mpasila/Alpacazord-Viking-7B) | 7B | Instruct | 0.4123 |
40
  | [mpasila/NordicAlpaca-Finnish-V1-7B](https://huggingface.co/mpasila/NordicAlpaca-Finnish-V1-7B) | 7B | Instruct | 0.3891 |
41
  | [mpasila/Finnish-Viking-Alpaca-V1-7B](https://huggingface.co/mpasila/Finnish-Viking-Alpaca-V1-7B) | 7B | Instruct | 0.3943 |
 
49
 
50
  #### FIN-bench scores:
51
 
52
+ | Task |Version| Metric |Value | |Stderr|
53
+ |------------------------------------------------|------:|---------------------|-----:|---|-----:|
54
+ |bigbench_analogies | 0|multiple_choice_grade|0.5923|± |0.0433|
55
+ |bigbench_arithmetic_1_digit_addition | 0|multiple_choice_grade|0.2700|± |0.0446|
56
+ |bigbench_arithmetic_1_digit_division | 0|multiple_choice_grade|0.4783|± |0.1065|
57
+ |bigbench_arithmetic_1_digit_multiplication | 0|multiple_choice_grade|0.2600|± |0.0441|
58
+ |bigbench_arithmetic_1_digit_subtraction | 0|multiple_choice_grade|0.2200|± |0.0416|
59
+ |bigbench_arithmetic_2_digit_addition | 0|multiple_choice_grade|0.1700|± |0.0378|
60
+ |bigbench_arithmetic_2_digit_division | 0|multiple_choice_grade|0.3600|± |0.0482|
61
+ |bigbench_arithmetic_2_digit_multiplication | 0|multiple_choice_grade|0.2000|± |0.0402|
62
+ |bigbench_arithmetic_2_digit_subtraction | 0|multiple_choice_grade|0.1300|± |0.0338|
63
+ |bigbench_arithmetic_3_digit_addition | 0|multiple_choice_grade|0.3100|± |0.0465|
64
+ |bigbench_arithmetic_3_digit_division | 0|multiple_choice_grade|0.2100|± |0.0409|
65
+ |bigbench_arithmetic_3_digit_multiplication | 0|multiple_choice_grade|0.1600|± |0.0368|
66
+ |bigbench_arithmetic_3_digit_subtraction | 0|multiple_choice_grade|0.2300|± |0.0423|
67
+ |bigbench_arithmetic_4_digit_addition | 0|multiple_choice_grade|0.3900|± |0.0490|
68
+ |bigbench_arithmetic_4_digit_division | 0|multiple_choice_grade|0.2300|± |0.0423|
69
+ |bigbench_arithmetic_4_digit_multiplication | 0|multiple_choice_grade|0.2100|± |0.0409|
70
+ |bigbench_arithmetic_4_digit_subtraction | 0|multiple_choice_grade|0.4500|± |0.0500|
71
+ |bigbench_arithmetic_5_digit_addition | 0|multiple_choice_grade|0.4800|± |0.0502|
72
+ |bigbench_arithmetic_5_digit_division | 0|multiple_choice_grade|0.0700|± |0.0256|
73
+ |bigbench_arithmetic_5_digit_multiplication | 0|multiple_choice_grade|0.1700|± |0.0378|
74
+ |bigbench_arithmetic_5_digit_subtraction | 0|multiple_choice_grade|0.5800|± |0.0496|
75
+ |bigbench_cause_and_effect_one_sentence | 0|multiple_choice_grade|0.6275|± |0.0684|
76
+ |bigbench_cause_and_effect_one_sentence_no_prompt| 0|multiple_choice_grade|0.6667|± |0.0667|
77
+ |bigbench_cause_and_effect_two_sentences | 0|multiple_choice_grade|0.5098|± |0.0707|
78
+ |bigbench_emotions | 0|multiple_choice_grade|0.3312|± |0.0373|
79
+ |bigbench_empirical_judgments | 0|multiple_choice_grade|0.3333|± |0.0476|
80
+ |bigbench_general_knowledge | 0|multiple_choice_grade|0.2857|± |0.0544|
81
+ |bigbench_hhh_alignment_harmless | 0|multiple_choice_grade|0.3793|± |0.0643|
82
+ |bigbench_hhh_alignment_helpful | 0|multiple_choice_grade|0.3559|± |0.0629|
83
+ |bigbench_hhh_alignment_honest | 0|multiple_choice_grade|0.3559|± |0.0629|
84
+ |bigbench_hhh_alignment_other | 0|multiple_choice_grade|0.5349|± |0.0770|
85
+ |bigbench_intent_recognition | 0|multiple_choice_grade|0.1546|± |0.0138|
86
+ |bigbench_misconceptions | 0|multiple_choice_grade|0.5448|± |0.0432|
87
+ |bigbench_paraphrase | 0|multiple_choice_grade|0.5300|± |0.0354|
88
+ |bigbench_sentence_ambiguity | 0|multiple_choice_grade|0.4333|± |0.0645|
89
+ |bigbench_similarities_abstraction | 0|multiple_choice_grade|0.6974|± |0.0530|
90
 
91
  # Uploaded model
92