lytang commited on
Commit
58ead4c
1 Parent(s): 88ade80

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -0
README.md CHANGED
@@ -27,6 +27,11 @@ We also have other two MiniCheck model variants:
27
 
28
 
29
  ### Model Performance
 
 
 
 
 
30
  The performance of these models is evaluated on our new collected benchmark (unseen by our models during training), [LLM-AggreFact](https://huggingface.co/datasets/lytang/LLM-AggreFact),
31
  from 10 recent human annotated datasets on fact-checking and grounding LLM generations. MiniCheck-DeBERTa-v3-Large outperform all
32
  exisiting specialized fact-checkers with a similar scale by a large margin but is 2% worse than our best model MiniCheck-Flan-T5-Large, which
 
27
 
28
 
29
  ### Model Performance
30
+
31
+ <p align="center">
32
+ <img src="./cost-vs-bacc.png" width="360">
33
+ </p>
34
+
35
  The performance of these models is evaluated on our new collected benchmark (unseen by our models during training), [LLM-AggreFact](https://huggingface.co/datasets/lytang/LLM-AggreFact),
36
  from 10 recent human annotated datasets on fact-checking and grounding LLM generations. MiniCheck-DeBERTa-v3-Large outperform all
37
  exisiting specialized fact-checkers with a similar scale by a large margin but is 2% worse than our best model MiniCheck-Flan-T5-Large, which