Update README.md
Browse files
README.md
CHANGED
@@ -91,6 +91,19 @@ Our results, with `result < 0.1, %:` being well below 0.9, indicate that our dat
|
|
91 |
|
92 |
*All benchmarks were performed with a sliding window of 4096. New Benchmarks with Sliding Window null coming soon
|
93 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
94 |
## Disclaimer
|
95 |
We must inform users that despite our best efforts in data cleansing, the possibility of uncensored content slipping through cannot be entirely ruled out.
|
96 |
However, we cannot guarantee consistently appropriate behavior. Therefore, if you encounter any issues or come across inappropriate content, we kindly request that you inform us through the contact information provided.
|
|
|
91 |
|
92 |
*All benchmarks were performed with a sliding window of 4096. New Benchmarks with Sliding Window null coming soon
|
93 |
|
94 |
+
**German RAG LLM Evaluation**
|
95 |
+
corrected result after FIX: https://github.com/huggingface/lighteval/pull/171
|
96 |
+
```
|
97 |
+
| Task |Version|Metric|Value| |Stderr|
|
98 |
+
|------------------------------------------------------|------:|------|----:|---|-----:|
|
99 |
+
|all | |acc |0.975|± |0.0045|
|
100 |
+
|community:german_rag_eval:_average:0 | |acc |0.975|± |0.0045|
|
101 |
+
|community:german_rag_eval:choose_context_by_question:0| 0|acc |0.953|± |0.0067|
|
102 |
+
|community:german_rag_eval:choose_question_by_context:0| 0|acc |0.998|± |0.0014|
|
103 |
+
|community:german_rag_eval:context_question_match:0 | 0|acc |0.975|± |0.0049|
|
104 |
+
|community:german_rag_eval:question_answer_match:0 | 0|acc |0.974|± |0.0050|
|
105 |
+
```
|
106 |
+
|
107 |
## Disclaimer
|
108 |
We must inform users that despite our best efforts in data cleansing, the possibility of uncensored content slipping through cannot be entirely ruled out.
|
109 |
However, we cannot guarantee consistently appropriate behavior. Therefore, if you encounter any issues or come across inappropriate content, we kindly request that you inform us through the contact information provided.
|