OPI-PG
/

Qra-13b

@@ -48,6 +48,8 @@ Below is a summary of the Qra-13B model:
 In this section we compare the perplexity of Qra models on Polish texts with other Polish and English LLMs.
 ### PolEval-2018
 In 2018, the PolEval competition included a language modeling task, for which training and test sets totaling over 20 million Polish sentences were made available. We used the first 10k sentences from the test set to evaluate modern neural language models. To calculate the perplexity, we used a script from the [HuggingFace Evaluate](https://huggingface.co/spaces/evaluate-metric/perplexity) library.

 In this section we compare the perplexity of Qra models on Polish texts with other Polish and English LLMs.
+Note that perplexity values between different text segmentations are not directly comparable. Therefore, we can draw conclusions based on comparisons only beetween models using the same tokenizer, such as Qra and the original LLama / TinyLLama.
 ### PolEval-2018
 In 2018, the PolEval competition included a language modeling task, for which training and test sets totaling over 20 million Polish sentences were made available. We used the first 10k sentences from the test set to evaluate modern neural language models. To calculate the perplexity, we used a script from the [HuggingFace Evaluate](https://huggingface.co/spaces/evaluate-metric/perplexity) library.