Model Card for FAPM (Functional Annotation of Proteins using Multi-Modal Models)
Adapted from BLIP2, a Q-Former was introduced between the protein sequence modality and the natural language modality for protein captioning. The protein sequence is encoded by pretrained ESM2, and Mistral-7B-v0.2 is used for decoding the natural language protein descriptions.
Model Details
Model Description
Assigning accurate property labels to proteins, like functional terms and catalytic activity, is challenging, especially for proteins without homologs and tail labels with few known examples. Unlike previous methods that mainly focused on protein sequence features, we use a pretrained large natural language model to understand the semantic meaning of protein labels. Specifically, we introduce FAPM, a contrastive multi-modal model that links natural language with protein sequence language. This model combines a pretrained protein sequence model with a pretrained large language model to generate labels, such as Gene Ontology (GO) functional terms and catalytic activity predictions, in natural language. Our results show that FAPM excels in understanding protein properties, outperforming models based solely on protein sequences or structures. It achieves state-of-the-art performance on public benchmarks and in-house experimentally annotated phage proteins, which often have few known homologs. Additionally, FAPM's flexibility allows it to incorporate extra text prompts, like taxonomy information, enhancing both its predictive performance and explainability. This novel approach offers a promising alternative to current methods that rely on multiple sequence alignment for protein annotation. The online demo is at: https://huggingface.co/spaces/wenkai/FAPM_demo.
Model Sources
Citation
BibTeX:
@article {Xiang2024.05.07.593067, author = {Xiang, Wenkai and Xiong, Zhaoping and Huan, Chen and Xiong, Jiacheng and Zhang, Wei and Fu, Zunyun and Zheng, Mingyue and Liu, Bing and Shi, Qian}, title = {FAPM: Functional Annotation of Proteins using Multi-Modal Models Beyond Structural Modeling}, elocation-id = {2024.05.07.593067}, year = {2024}, doi = {10.1101/2024.05.07.593067}, publisher = {Cold Spring Harbor Laboratory}, URL = {https://www.biorxiv.org/content/early/2024/07/03/2024.05.07.593067}, eprint = {https://www.biorxiv.org/content/early/2024/07/03/2024.05.07.593067.full.pdf}, journal = {bioRxiv} }
Model Card Authors
- Wenkai Xiang ([email protected])
- Zhaoping Xiong ([email protected])