Post
2039
๐จ๐๐ถ๐ป๐ด ๐๐๐ -๐ฎ๐-๐ฎ-๐ท๐๐ฑ๐ด๐ฒ ๐งโโ๏ธ ๐ณ๐ผ๐ฟ ๐ฎ๐ป ๐ฎ๐๐๐ผ๐บ๐ฎ๐๐ฒ๐ฑ ๐ฎ๐ป๐ฑ ๐๐ฒ๐ฟ๐๐ฎ๐๐ถ๐น๐ฒ ๐ฒ๐๐ฎ๐น๐๐ฎ๐๐ถ๐ผ๐ป
Evaluating LLM outputs is often hard, since many tasks require open-ended answers for which no deterministic metrics work: for instance, when asking a model to summarize a text, there could be hundreds of correct ways to do it. The most versatile way to grade these outputs is then human evaluation, but it is very time-consuming, thus costly.
๐ค Then ๐๐ต๐ ๐ป๐ผ๐ ๐ฎ๐๐ธ ๐ฎ๐ป๐ผ๐๐ต๐ฒ๐ฟ ๐๐๐ ๐๐ผ ๐ฑ๐ผ ๐๐ต๐ฒ ๐ฒ๐๐ฎ๐น๐๐ฎ๐๐ถ๐ผ๐ป, by providing it relevant rating criteria? ๐ This is the idea behind LLM-as-a-judge.
โ๏ธ To implement a LLM judge correctly, you need a few tricks.
โ So ๐'๐๐ฒ ๐ท๐๐๐ ๐ฝ๐๐ฏ๐น๐ถ๐๐ต๐ฒ๐ฑ ๐ฎ ๐ป๐ฒ๐ ๐ป๐ผ๐๐ฒ๐ฏ๐ผ๐ผ๐ธ ๐๐ต๐ผ๐๐ถ๐ป๐ด ๐ต๐ผ๐ ๐๐ผ ๐ถ๐บ๐ฝ๐น๐ฒ๐บ๐ฒ๐ป๐ ๐ถ๐ ๐ฝ๐ฟ๐ผ๐ฝ๐ฒ๐ฟ๐น๐ ๐ถ๐ป ๐ผ๐๐ฟ ๐๐๐ด๐ด๐ถ๐ป๐ด ๐๐ฎ๐ฐ๐ฒ ๐๐ผ๐ผ๐ธ๐ฏ๐ผ๐ผ๐ธ! (you can run it instantly in Google Colab)
โก๏ธ ๐๐๐ -๐ฎ๐-๐ฎ-๐ท๐๐ฑ๐ด๐ฒ ๐ฐ๐ผ๐ผ๐ธ๐ฏ๐ผ๐ผ๐ธ: https://huggingface.co/learn/cookbook/llm_judge
The Cookbook is a great collection of notebooks demonstrating recipes (thus the "cookbook") for common LLM usages. I recommend you to go take a look!
โก๏ธ ๐๐น๐น ๐ฐ๐ผ๐ผ๐ธ๐ฏ๐ผ๐ผ๐ธ๐: https://huggingface.co/learn/cookbook/index
Thank you @MariaK for your support!
Evaluating LLM outputs is often hard, since many tasks require open-ended answers for which no deterministic metrics work: for instance, when asking a model to summarize a text, there could be hundreds of correct ways to do it. The most versatile way to grade these outputs is then human evaluation, but it is very time-consuming, thus costly.
๐ค Then ๐๐ต๐ ๐ป๐ผ๐ ๐ฎ๐๐ธ ๐ฎ๐ป๐ผ๐๐ต๐ฒ๐ฟ ๐๐๐ ๐๐ผ ๐ฑ๐ผ ๐๐ต๐ฒ ๐ฒ๐๐ฎ๐น๐๐ฎ๐๐ถ๐ผ๐ป, by providing it relevant rating criteria? ๐ This is the idea behind LLM-as-a-judge.
โ๏ธ To implement a LLM judge correctly, you need a few tricks.
โ So ๐'๐๐ฒ ๐ท๐๐๐ ๐ฝ๐๐ฏ๐น๐ถ๐๐ต๐ฒ๐ฑ ๐ฎ ๐ป๐ฒ๐ ๐ป๐ผ๐๐ฒ๐ฏ๐ผ๐ผ๐ธ ๐๐ต๐ผ๐๐ถ๐ป๐ด ๐ต๐ผ๐ ๐๐ผ ๐ถ๐บ๐ฝ๐น๐ฒ๐บ๐ฒ๐ป๐ ๐ถ๐ ๐ฝ๐ฟ๐ผ๐ฝ๐ฒ๐ฟ๐น๐ ๐ถ๐ป ๐ผ๐๐ฟ ๐๐๐ด๐ด๐ถ๐ป๐ด ๐๐ฎ๐ฐ๐ฒ ๐๐ผ๐ผ๐ธ๐ฏ๐ผ๐ผ๐ธ! (you can run it instantly in Google Colab)
โก๏ธ ๐๐๐ -๐ฎ๐-๐ฎ-๐ท๐๐ฑ๐ด๐ฒ ๐ฐ๐ผ๐ผ๐ธ๐ฏ๐ผ๐ผ๐ธ: https://huggingface.co/learn/cookbook/llm_judge
The Cookbook is a great collection of notebooks demonstrating recipes (thus the "cookbook") for common LLM usages. I recommend you to go take a look!
โก๏ธ ๐๐น๐น ๐ฐ๐ผ๐ผ๐ธ๐ฏ๐ผ๐ผ๐ธ๐: https://huggingface.co/learn/cookbook/index
Thank you @MariaK for your support!