Accurate Prediction of Ligand-Protein Interaction Affinities with Fine-Tuned Small Language Models
Abstract
We describe the accurate prediction of ligand-protein interaction (LPI) affinities, also known as drug-target interactions (DTI), with instruction fine-tuned pretrained generative small language models (SLMs). We achieved accurate predictions for a range of affinity values associated with ligand-protein interactions on out-of-sample data in a zero-shot setting. Only the SMILES string of the ligand and the amino acid sequence of the protein were used as the model inputs. Our results demonstrate a clear improvement over machine learning (ML) and free-energy perturbation (FEP+) based methods in accurately predicting a range of ligand-protein interaction affinities, which can be leveraged to further accelerate drug discovery campaigns against challenging therapeutic targets.
Community
This paper describes the use of fine-tuned small language models to accurately predict biological interactions, and is useful in prioritizing small molecules for progression in drug discovery campaigns. The generality of the method is shown to improve as the fine-tuning data set size increases. This work also highlights the importance of, and potential business impact of, generative models in assisting with the prioritization of activities and workstreams outside of the traditional text-generation/NLP space.
Hi @BFauber congrats on this work! Are you planning on sharing any artifacts on the hub (e.g. you could upload models and link them to this paper page), see the following resources:
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper