--- license: cc-by-nc-4.0 --- # **TrueTeacher** This is a **Factual Consistency Evaluation** model, introduced in the [TrueTeacher paper (Gekhman et al, 2023)](https://arxiv.org/pdf/2305.11171.pdf). ## Model Details The model is optimized for evaluating factual consistency in **summarization**. It is the main model from the paper (see "T5-11B w. ANLI + TrueTeacher full" in Table 1) which is based on a **T5-11B** [(Raffel et al., 2020)](https://jmlr.org/papers/volume21/20-074/20-074.pdf) fine-tuned with a mixture of the following datasets: - TrueTeacher ([Gekhman et al., 2023](https://arxiv.org/pdf/2305.11171.pdf)) - ANLI ([Nie et al., 2020](https://aclanthology.org/2020.acl-main.441.pdf)) The input format for the model is: "premise: GROUNDING_DOCUMENT hypothesis: HYPOTHESIS_SUMMARY". To accomodate the input length of common summarization datasets we recommend setting **max_length** to **2048**. The model predicts a binary label ('1' - Factualy Consistent, '0' - Factualy Inconsistent). ## Evaluation results This model achieves the following ROC AUC results on the summarization subset of the [TRUE benchmark (Honovich et al, 2022)](https://arxiv.org/pdf/2204.04991.pdf): | **MNBM** | **QAGS-X** | **FRANK** | **SummEval** | **QAGS-C** | **Average** | |----------|-----------|-----------|--------------|-----------|-------------| | 78.1 | 89.4 | 93.6 | 88.5 | 89.4 | 87.8 | ## Usage examples #### classification ```python from transformers import T5ForConditionalGeneration from transformers import T5Tokenizer model_path = 'google/t5_11b_trueteacher_and_anli' tokenizer = T5Tokenizer.from_pretrained(model_path) model = T5ForConditionalGeneration.from_pretrained(model_path) premise = 'the sun is shining' for hypothesis, expected in [('the sun is out in the sky', '1'), ('the cat is shiny', '0')]: input_ids = tokenizer( f'premise: {premise} hypothesis: {hypothesis}', return_tensors='pt', truncation=True, max_length=2048).input_ids outputs = model.generate(input_ids) result = tokenizer.decode(outputs[0], skip_special_tokens=True) print(f'premise: {premise}') print(f'hypothesis: {hypothesis}') print(f'result: {result} (expected: {expected})\n') ``` #### scoring ```python from transformers import T5ForConditionalGeneration from transformers import T5Tokenizer import torch model_path = 'google/t5_11b_trueteacher_and_anli' tokenizer = T5Tokenizer.from_pretrained(model_path) model = T5ForConditionalGeneration.from_pretrained(model_path) premise = 'the sun is shining' for hypothesis, expected in [('the sun is out in the sky', '>> 0.5'), ('the cat is shiny', '<< 0.5')]: input_ids = tokenizer( f'premise: {premise} hypothesis: {hypothesis}', return_tensors='pt', truncation=True, max_length=2048).input_ids decoder_input_ids = torch.tensor([[tokenizer.pad_token_id]]) outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids) logits = outputs.logits probs = torch.softmax(logits[0], dim=-1) one_token_id = tokenizer('1').input_ids[0] entailment_prob = probs[0, one_token_id].item() print(f'premise: {premise}') print(f'hypothesis: {hypothesis}') print(f'score: {entailment_prob:.3f} (expected: {expected})\n') ``` ## Citation If you use this model for a research publication, please cite the TrueTeacher paper (using the bibtex entry below), as well as the ANLI and T5 papers mentioned above. ``` @misc{gekhman2023trueteacher, title={TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models}, author={Zorik Gekhman and Jonathan Herzig and Roee Aharoni and Chen Elkind and Idan Szpektor}, year={2023}, eprint={2305.11171}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```