This is a flan-t5-based model pre-trained on VG scene graph parsing dataset first and then fine-tuned on FACTUAL scene graph parsing dataset. See model details from 'https://github.com/zhuang-li/FACTUAL/tree/main '.
If you use the model, please cite:
@inproceedings{li-etal-2023-factual,
title = "{FACTUAL}: A Benchmark for Faithful and Consistent Textual Scene Graph Parsing",
author = "Li, Zhuang and
Chai, Yuyang and
Zhuo, Terry Yue and
Qu, Lizhen and
Haffari, Gholamreza and
Li, Fei and
Ji, Donghong and
Tran, Quan Hung",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-acl.398",
pages = "6377--6390",
}