Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
anakin87Β 
posted an update Jun 17
Post
924
πŸ§ͺ RAG Evaluation with πŸ”₯ Prometheus 2 + Haystack

πŸ“ Blog post: https://haystack.deepset.ai/blog/rag-evaluation-with-prometheus-2
πŸ““ Notebook: https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/prometheus2_evaluation.ipynb

─── β‹†β‹…β˜†β‹…β‹† ───

When evaluating LLMs' responses, 𝐩𝐫𝐨𝐩𝐫𝐒𝐞𝐭𝐚𝐫𝐲 𝐦𝐨𝐝𝐞π₯𝐬 like GPT-4 are commonly used due to their strong performance.
However, relying on closed models presents challenges related to data privacy πŸ”’, transparency, controllability, and cost πŸ’Έ.

On the other hand, 𝐨𝐩𝐞𝐧 𝐦𝐨𝐝𝐞π₯𝐬 typically do not correlate well with human judgments and lack flexibility.


πŸ”₯ Prometheus 2 is a new family of open-source models designed to address these gaps:
πŸ”Ή two variants: prometheus-eval/prometheus-7b-v2.0; prometheus-eval/prometheus-8x7b-v2.0
πŸ”Ή trained on open-source data
πŸ”Ή high correlation with human evaluations and proprietary models
πŸ”Ή highly flexible: capable of performing direct assessments and pairwise rankings, and allowing the definition of custom evaluation criteria.

See my experiments with RAG evaluation in the links above.
In this post