Volume 84, Issue 1 pp. 54-71
ORIGINAL ARTICLE
Full Access

In silico analysis of nonsynonymous single-nucleotide polymorphisms (nsSNPs) of the SMPX gene

Md. Arifuzzaman

Md. Arifuzzaman

College of Pharmacy, Yeungnam University, Gyeongbuk, Republic of Korea

Search for more papers by this author
Sarmistha Mitra

Sarmistha Mitra

Plasma Bioscience Research Center, Plasma-Bio Display, Kwangwoon University, Seoul, Republic of Korea

Search for more papers by this author
Raju Das

Raju Das

Department of Biochemistry and Biotechnology, University of Science & Technology Chittagong, Chittagong, Bangladesh

Search for more papers by this author
Amir Hamza

Amir Hamza

Department of Biochemistry, Hallym University, Gangwon, Republic of Korea

Search for more papers by this author
Nurul Absar

Nurul Absar

Department of Biochemistry and Biotechnology, University of Science & Technology Chittagong, Chittagong, Bangladesh

Search for more papers by this author
Raju Dash

Corresponding Author

Raju Dash

Department of Anatomy, Dongguk University Graduate School of Medicine, Gyeongju, Republic of Korea

Correspondence

Raju Dash, Dongguk University Graduate School of Medicine, Gyeongju 38066, Republic of Korea.

Email: [email protected]

Search for more papers by this author
First published: 03 October 2019
Citations: 21

Abstract

Mutations in the SMPX gene can disrupt the regular activity of the SMPX protein, which is involved in the hearing process. Recent reports showing a link between nonsynonymous single-nucleotide polymorphisms (nsSNPs) in SMPX and hearing loss, thus classifying deleterious SNPs in SMPX will be an uphill task before designing a more extensive population study. In this study, damaging nsSNPs of SMPX from the dbSNP database were identified by using 13 bioinformatics tools. Initially, the impact of nsSNPs in the SMPX gene were evaluated through different in silico predictors; and the deleterious convergent changes were analyzed by energy-minimization-guided residual network analysis. In addition, the pathogenic effects of mutations in SMPX-mediated protein–protein interactions were also characterized by structural modeling and binding energy calculations. A total of four mutations (N19D, A29T, K54N, and S71L) were found to be highly deleterious by all the tools, which are located at highly conserved regions. Furthermore, all four mutants showed structural alterations, and the communities of amino acids for mutant proteins were readily changed, compared to the wild-type. Among them, A29T (rs772775896) was revealed as the most damaging nsSNP, which caused significant structural deviation of the SMPX protein, as a result reducing the binding affinity to other functional partners. These findings reflect the computational insights into the deleterious role of nsSNPs in SMPX, which might be helpful for subjecting wet-lab confirmatory analysis.

1 INTRODUCTION

X-linked nonsyndromic hearing impairment is generally rare, with a percentage of ∼1%–5% and ∼5% cases for prelingual male deafness (Petersen, Wang, & Willems, 2008). According to a recent study, six nonsyndromic hearing loss loci have been identified and, among them, SMPX (DFNX4, OMIM 300066) is located in chromosome X at p.22.1 containing five exons, where only the first and fifth exons are noncoding (Patzak, Zhuchenko, Lee, & Wehnert, 1999). The SMPX gene encodes 88 amino acids and are cloned first from muscle (Patzak et al., 1999). The gene is mainly conserved across mammalian species (Palmer et al., 2001), and the protein contains unknown functional domains and is also involved primarily in inner ear development through interactions with insulin-like growth factor 1 (Cediel, Riquelme, Contreras, Díaz, & Varela-Nieto, 2006; Palmer et al., 2001), integrins (α8β1), and Rac1 (Evans & Müller, 2000; Grimsley-Myers, Sipe, Géléoc, & Lu, 2009). Furthermore, SMPX is associated with the cytoskeleton and maintains stereocilia that are being exposed to physical forces (Manor & Kachar, 2008) and also protects the cells of the cochlea from the mechanical stress that is exerted during the process of hearing (Huebner et al., 2011). However, it was reported that mutations in the SMPX gene can cause hearing impairment with delayed detection and intervention and can become a more developed form of hearing loss (Deng et al., 2018; Gao et al., 2018; Schraders et al., 2011; Weegerink et al., 2011). Moreover, the mutation can also cause significant damage to the inner ear cells and lead to progressive hearing loss (Abdelfatah et al., 2013). Several studies have already been conducted to identify the association between SMPX mutation and X-linked hearing loss in various families (Deng et al., 2018; Gao et al., 2018; Reardon et al., 1991; Tyson et al., 1996). According to one report regarding the effect of mutation on SMPX protein, mutation leads to the transcript of the gene with a premature stop codon resulting in nonsense-mediated messenger RNA decay, which ultimately causes the loss-of-function of the SMPX protein (Huebner et al., 2011).

Nonsynonymous single-nucleotide polymorphisms (nsSNPs) can cause 50% of the diseases that are related to inheritance (Doniger et al., 2008; Radivojac et al., 2010; Ramensky, Bork, & Sunyaev, 2002; Stenson et al., 2003). Previous studies showed that 60% of Mendelian disease are caused by SNPs (Botstein & Risch, 2003; Cummings et al., 2017; Eilbeck, Quinlan, & Yandell, 2017), which can disrupt the normal function of a protein and denature its structure by changing its stability, folding pattern, and ligand-binding site (Doss et al., 2012; Khan et al., 2017). The effects of nsSNPs on SMPX protein structure and functions still remains elusive; therefore, in this present experiment, we analyze the deleterious effect of nsSNPs on SMPX gene by using various computational databases and bioinformatics tools.

2 MATERIALS AND METHODS

2.1 Data retrieval

From the SNP database of the National Center for Biotechnology Information (dbSNP) (http://www.ncbi.nlm.nih.gov/snp), the corresponding datasets of the SMPX gene were retrieved following their rsIDs (Sherry et al., 2001) in January 2018.

2.2 Identification of damaging SNPs

For the isolation of deleterious SNPs and to analyze their impact on protein structure and function, 11 major and widely accepted computational tools were used. The aim of this study is to consider only deleterious SNPs rather than neutral ones.

SIFT (sorting intolerant from tolerant) (http://sift.jcvi.org) is a web-based tool that is used to distinguish damaging SNPs from tolerated SNPs based on sequence homology (Kumar, Henikoff, & Ng, 2009; Ng & Henikoff, 2003, 2006). The prediction score was based on a range of values, where a score of ≤0.05 is considered as deleterious, and a score of ≥0.05 is considered as tolerated (Ng & Henikoff, 2001, 2002). SIFT can be applied by using SWISS-PROT, SWISS-PROT/TrEMBL, or NCBI's nonredundant protein databases for SIFT to search (Bairoch & Apweiler, 2000; Wheeler et al., 2002).

PolyPhen-2 (Polymorphism Phenotyping v2) (http://genetics.bwh.harvard.edu) is another tool extensively used for prediction of the functional impact of SNPs on protein structure and function (Adzhubei et al., 2010; Itan et al., 2016). The input for this tool was the use of the particular protein sequence in the FASTA format with the position of substitution and native as well as the substituted amino acid, and it is used as a Bayesian classifier method for prediction (Adzhubei, Jordan, & Sunyaev, 2013). Two prediction models were available: HumDiv and HumVar (Ramensky et al., 2002). HumDiv mainly identifies the less-damaging SNPs, whereas the HumVar classifies the SNPs with an extreme phenotypic effect on the basis of the position-specific independent counts (PSIC) score (Sunyaev et al., 1999). The probabilistic score ranged from 0 (neutral) to 1 (deleterious), and functional significance was categorized into benign (0.00–0.14), possibly damaging (0.15–0.84), and probably damaging (0.85–1) (Gemovic, Perovic, Glisic, & Veljkovic, 2013).

PROVEAN (http://provean.jcvi.org/index.php) was used for the prediction of changes in the biological function of a protein caused by an amino acid substitution (Choi & Chan, 2015). This actually worked based on sequence clustering and the alignment-based clustering score (Choi, 2012). For the generation of the final PROVEAN score, BLAST hits with more than 75% global sequence identity were clustered together and the top 30 such clusters from a supporting sequence were averaged within and across clusters (Choi, Sims, Murphy, Miller, & Chan, 2012). A score of less than 2.5 was considered as damaging and a score higher than −2.5 was considered neutral (Goswami, 2015).

PredictSNP (https://loschmidt.chemi.muni.cz/predictsnp) is the most recent tool and is very robust for distinguishing damaging SNPs from the neutral SNPs. PredictSNP makes use of various other bioinformatics tools, generating a final, combined result, which is very reliable and convenient (Bendl et al., 2014). The most important tools that were used in this case were MAPP (Stone & Sidow, 2005), nsSNPAnalyzer (Bao, Zhou, & Cui, 2005), PANTHER (Mi, Guo, Kejariwal, & Thomas, 2006; Thomas et al., 2003), PhD-SNP (Capriotti, Calabrese, & Casadio, 2006), PolyPhen-1, PolyPhen-2 (Adzhubei et al., 2010), SIFT (Ng & Henikoff, 2003), and SNAP (Bromberg & Rost, 2007). To be a damaging SNP, it must be validated by at least three to four tools used in PredictSNP. The acquired results were provided together with annotations extracted from the Protein Mutant Database and the UniProt Database (Bendl et al., 2014). The predicted values in the interval <−1, 0> are considered neutral, whereas they are deleterious for the values in the interval (0, +1>).

I-Mutant 3.0 (http://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant3.0/I-Mutant3.0.cgi) is a support vector machine (SVM)-based tool that is used to predict the protein stability change caused by a single-nucleotide substitution (Capriotti, Fariselli, & Casadio, 2005b). Input for this tool was the use of structure or sequence of the protein. We used sequence-based input for analysis. It predicted the protein stability change in three different classes such as neutral mutation when the value of DDG was between −0.5 and 0.5 (−0.5 ≤ DDG ≤ 0.5 kcal/mol), a large decrease (<−0.5 kcal/mol), and a large increase (˃0.5 kcal/mol) (Capriotti et al., 2006). The output file represents the predicted free energy change (DDG) value, which was calculated from the folding Gibbs free energy of the mutated protein minus the unfolding free energy value of the native protein (Capriotti et al., 2006; Capriotti, Fariselli, Calabrese, & Casadio, 2005a).

The impact of nsSNP on protein stability was further predicted by the mutation cut-off scanning matrix (mCSM) (http://biosig.unimelb.edu.au/mcsm/stability) server, which calculated the impacts of amino acid substitution based on atomic distance patterns. Three-dimensional structures, predicted from homology modeling, was provided as input, where the destabilizing variants were characterized by their resultant output score from the mCSM server (Pires, Ascher, & Blundell, 2013).

Mutation assessor (http://mutationassessor.org/r3) is an online-based tool that assesses the functional effect on the basis of evolutionary conservation of the impacted amino acid and validated using disease-related Online Mendelian inheritance in men and the polymorphic database (Reva, Antipin, & Sander, 2011). The protein ID was submitted to the server with the default configuration for assessing functional behavior of mutations, where the functional impact of a substitution is classified as neutral, low, medium, or high.

PhD-SNP, also known as predictor of human deleterious single-nucleotide polymorphisms (SNPs) (http://snps.biofold.org/phd-snp/phd-snp.html) is an SVM classifier that is optimized to assess whether a nonsynonymous single-point mutation can be grouped as disease-associated or neutral polymorphism from protein sequence information (Capriotti et al., 2006). The techniques are calculated by using BLAST software (Altschul et al., 1997) against the UniRef90 database (Suzek, Huang, McGarvey, Mazumder, & Wu, 2007) as entry data about the mutation, such as its background setting and status at the genetically altered site. PhD-SNP yields an output grade (range 0–1) for each mutation, which reflects the likelihood means; this nsSNP will be connected with disease. The technique claims that 0.5 is the threshold below which disease-associated nsSNPs are expected to be (Capriotti, Altman, & Bromberg, 2013a).

SNPs&GO (http://snps.biofold.org/snps-and-go/snps-and-go.html), based on the information of molecular and functional information from the gene ontology (GO) database, calculates deleterious effect of nsSNPs, where resultant output provides a probability value. Probability values greater than 0.5 are considered to represent nsSNP as disease-causing (Capriotti et al., 2013b).

VEST (variant effect scoring tool) (http://www.cravat.us/CRAVAT) is a machine learning strategy to assess the possibility of a missense mutation impairing a protein's functions. VEST utilizes a model for statistical hypothesis testing to assign predictions to P-values. These P-values could be compiled across various disease exomes at the gene stage to evaluate them for possible involvement in disease. The principal distinction in the VEST method is that the training set and methodology of the forecast are explicitly designed for Mendelian studies (Carter, Douville, Stenson, Cooper, & Karchin, 2013).

2.3 Characterization of protein stability changed by mutations

MUpro (http://mupro.proteomics.ics.uci.edu) is a set of machine learning programs to predict how single-site amino acid mutation affects protein stability (Cheng, Randall, & Baldi, 2006). To analyze protein stability change by MUpro, there is no need for a tertiary protein structure, but a protein sequence in FASTA format can be used. To predict the sign of energy change using SVMs and neural networks, a method used effects of mutation on protein stability and a confidence score between −1 and 1 to measure the confidence of the prediction. A score <0 means the mutation decreases the protein stability. The smaller the score, the more confident the prediction. Conversely, a score of >0 means the mutation increases the protein stability. The bigger the score, the more confident the prediction.

I-Mutant 2.0 (Seq) (http://biofold.org/folding) is an SVM-based web server for the automatic prediction of protein stability changes upon single-site mutations. The tool was trained on a dataset derived from ProTherm (Bava, Gromiha, Uedaira, Kitajima, & Sarai, 2004), which presently contains the most comprehensive database of experimental data on protein mutations. This predictor can evaluate the stability change upon the single-site mutation starting from the protein structure or the protein sequence. This server is optimized to predict the protein stability change upon mutation either starting from the protein structure or the protein sequence. In both cases, the end-user can predict the protein stability change corresponding to all possible mutations of a particular residue, or ask only for a specific mutation. In either case, I-Mutant 2.0 can predict the direction of the free energy change and its value. For either prediction, the option is to predict the sign (increase +, decrease −) of the free energy change (DDG) or its value (± DDG) upon mutation (Capriotti et al., 2005b).

iStable (http://predictor.nchu.edu.tw/istable/indexSeq.php) uses an SVM to predict protein stability changes upon single amino acid residue mutations (Chen, Lin, & Chu, 2013). Structure and sequence can both be used as input. In this study, we used sequence-based input to analyze the stability change. In the construction of iStable, five web-based prediction tools were chosen as element predictors: I-Mutant 2.0 (Capriotti et al., 2005a), MUpro (Cheng et al., 2006), AUTO-MUTE (Masso & Vaisman, 2008), PoPMuSiC 2.0 (Dehouck et al., 2009), and CUPSAT (Parthiban, Gromiha, & Schomburg, 2006). We then obtained the output result as decreased stability with a confidence score.

2.4 Analysis of mutation-induced structural impact

2.4.1 Conservancy and mechanisms analysis

ConSurf server (consurf.tau.ac.il) is a bioinformatics tool for the analysis of evolutionary conservation of an amino acid in a protein, DNA or RNA, by using phylogenetic relationships among the homologous sequences. The phylogenetic tree was generated, and evolutionary conservation analysis was done by using the ConSurf server (Celniker et al., 2013). A Bayesian calculation method was used to calculate conservation score from protein sequence (Ashkenazy, Erez, Martz, Pupko, & Ben-Tal, 2010; Mayrose, Graur, Ben-Tal, & Pupko, 2004). A conservation score between 1 and 4 was indicated as variable, whereas a score of 5–6 was intermediate, and a score between 7 and 9 indicated conserved (Jia et al., 2014).

MutPred (mutpred.mutdb.org) was utilized for the analysis of protein's structural and functional changes caused by amino acid substitutions (Li et al., 2009). It also predicted the molecular causes of deleterious mutations and separated the damaging mutations from neutral mutations. It was trained using the deleterious mutations from the Human Gene Mutation Database (Stenson et al., 2009), and the SIFT algorithm was used in this tool (Ng & Henikoff, 2003). The score range and prediction output can be: scores with g > 0.5 and p < 0.05 referred to as actionable hypotheses, scores with g > 0.75 and p < 0.05 referred to as confident hypotheses, and scores with g > 0.75 and p < 0.01 referred to as very confident hypotheses (Li et al., 2009).

2.4.2 Protein modeling and refinement

The I-TASSER server is an online platform that implements the TASSER-based algorithms for protein structure prediction. It can automatically generate high-quality model predictions of 3D structure and biological function of protein molecules from their amino acid sequences (Yang et al., 2015). In this study, we used I-TASSER for SMPX modeling and then carried out the mutational protein modeling. For the validation of the predicted model, RAMPAGE was used for Ramachandran plot analysis, which uses the “protein structure and model assessment tools” of the SWISS-MODEL workspace (Arifuzzaman et al., 2018). The preeminent model that showed the best result was selected based on overall G-factor, the number of residues in the core allowed, generously allowed, and disallowed regions. The model was then further validated by the ERRAT server (Colovos & Yeates, 1993). To understand the effect of mutations, computational mutagenesis was performed by using the Mutate Residue script from Schrödinger (New York, NY). It was presumed that no local rearrangement takes place with the mutation (Dash et al., 2017), and the mutated systems were not minimized. After that, each model was subjected to further MD refinement using the YASARA software to 500 ps molecular dynamics simulation at a temperature of 298 K, pH 7.4, and solvent density of 0.997. The YAMBER3 force field (Krieger, Darden, Nabuurs, Finkelstein, & Vriend, 2004) was utilized alongside keeping default simulation parameters characterized by the macro. Three repeated refinements were carried out for each mutation. The best snapshot having the lowest force field energies was chosen to characterize the structural changes, including root mean square deviation (RMSD), solvent accessible surface area (SASA), and residue network analysis. The 3D visualization of the protein structure was done by using Maestro software (v9.9, Schrödinger) (Maestro, 2014).

2.4.3 Project HOPE analysis

HOPE (http://www.cmbi.umcn.nl/hope) was used to predict the structural effects of a mutation for further validation of MD refinement. It was aimed to visualize and understand the mutation of interest (Venselaar, te Beek, Kuipers, Hekkelman, & Vriend, 2010). HOPE collects structural information from a series of sources, including calculations on the 3D protein structure, sequence annotations in UniProt, and predictions from Reprof software. HOPE combines this information to analyze the effect of a certain mutation on protein structures.

2.4.4 Residual network analysis

The creations of protein structure networks for wild and mutant types were obtained by atomic correlation data derived from normal mode analysis (NMA), which is one of the commonly used time-independent types of simulations that searches long and local conformational changes in protein structures, can be used (Bahar & Rader, 2005). Generally, collective motions of the particles in an ensemble are characterized by normal modes, which frequencies, together with the shifting of the particles, can be estimated using a force field method or wave function of the Hessian matrix of second derivatives (Leach & Leach, 2001). In this study, a C-alpha force field has been used, which employs a spring force constant differentiating between nearest-neighbor pairs along the backbone and all other pairs. The constant force function was parameterized by fitting to a local minimum of a crambin model using the AMBER94 force field (Hinsen, Petrescu, Dellerue, Bellissent-Funel, & Kneller, 2000). Herein, the second derivative matrix of the potential energy was calculated, V. The normal mode vectors were obtained by the eigenvalue equation (Equation 1):
urn:x-wiley:00034800:media:ahg12350:ahg12350-math-0001(1)
The normal modes are described by eigenvectors (A) and their eigenvalues (λ). The eigenvalues were connected to a conformer transition of the protein along the specified eigenvectors. The communities of residues were identified from the set of correlated residues by using the Girvan–Newman clustering algorithm (Girvan & Newman, 2002). Followed by DCCM obtained from the NMA, the network constructed based on the quantity Cij (Equation 2) in the DCCM can be interpreted as an adjacency matrix, where the weight wij of the edge between the nodes i and j was defined as
urn:x-wiley:00034800:media:ahg12350:ahg12350-math-0002(2)

In the constructed network, each node corresponds to a Cα atom and each edge is an information transfer probability (i.e., cross-correlation). Communities were identified using the edge “betweenness” approach, which is defined as the number of the shortest paths between a pair of nodes (amino acid residues). The community clustering and node-betweenness calculations were carried out using the DCCM and can function in the Bio3D package (Grant, Rodrigues, ElSawy, McCammon, & Caves, 2006) in R (Ihaka & Gentleman, 1996).

2.5 Analysis of protein–protein interaction network and functional characterization

Protein–protein interaction analysis is essential to reveal the mechanism and interaction of various proteins in the cell to analyze the effect of one abnormal protein with other proteins as well as its association with disease. STRING (search tool for the retrieval of interacting genes/proteins) is a web-based server that was used for the study of the protein–protein interaction of SMPX (Szklarczyk et al., 2010, 2016). For the study of SMPX interacting partners, a high confidence score of 0.700 was used to generate networks. For functional characterization, the 3D structures of interacting partners, which are highlighted in the node, were collected from the SWISS-MODEL (https://swissmodel.expasy.org) (Schwede, Kopp, Guex, & Peitsch, 2003), which is a database of homology modeling. The InterPred web server (Mirabello, Wallner, & Bioinformatics, 2017) was then used to model the protein–protein interaction between SMPX and other interacting partners. The generated complexes were further mutated and refined with YAMBER3 force field (Krieger et al., 2004), and the refined complexes were then submitted to PRODIGY (Xue, Rodrigues, Kastritis, Bonvin, & Vangone, 2016) to define binding energy (kcal/mol) and hot spots, considering the experiment temperature of 25°C.

2.6 Statistical analysis

SPSS v19 software was used for predicting the correlation among various bioinformatics tools, followed by t-test and single-factor ANOVA test at P < 0.0001 for the most significant combinations (AbdulAzeez & Borgio, 2016; Abdulazeez et al., 2019; Borgio, Al-Madan, & AbdulAzeez, 2016).

3 RESULTS

3.1 Dataset

A total of 10,628 SNPs were found from a preliminary search in the dbSNP database of NCBI for SMPX. Among them, Homo sapiens comprises 3,989 SNPs, which means 37.5329% of total SNPs. The missense variants occupy 0.6517% with a number of 26 SNPs for a total of 3,989 SNPs. Three SNPs are nonsense, which was only 0.075%. In introns, 3,691 SNPs (92.53%), in 5′UTR regions, 21 SNPs (0.53%), in 3′UTR regions, 29 SNPs (0.727%), and 16 SNPs (0.40%) were found as coding synonymous and three SNPs (0.075%) as frame shift variants (Figure 1a). Only the missense SNPs were then considered for further analysis.

Details are in the caption following the image
Distribution of single-nucleotide polymorphisms (SNPs) in SMPX genes, including missense, nonsense, intron, 5′UTR and 3′UTR regions, synonymous, and frame shift (a). Number of predicted damaging nonsynonymous SNPs (nsSNPs) in SMPX by various state-of-the-art tools (b). The correlations among the deleterious predictions by various computational tools in SMPX genes is represented in a surface chart (c) [Color figure can be viewed at wileyonlinelibrary.com]

3.2 Identification of the most deleterious nsSNPs from SMPX

The deleterious effects of nonsynonymous SNPs on the SMPX gene were initially predicted by the accumulation of 10 state-of-art-tools, where the results were characterized by the deleterious based on the predictive score from SIFT (= 0), PolyPhen-2 HumDiv (>0.9), PolyPhen-2 HumVar (>0.9), PROVEAN (<−2.5), PredictSNP (0, +1>), I-Mutant 3.0 (<−0.5), mCSM (<0), Mutation assessor (>2), PhD-SNP (<0.5), SNPs&GO (>0.5), and VEST (<0.05). Figure 1b represents the total number of damaging nsSNPs reported from all tools, where PhD-SNP predicted the highest number of deleterious SNPs, and SNPs&GO showed none. After that, the results were correlated with each other, as shown in Figure 1c, highlighting the darkest red region as the positive correlated deleterious nsSNP in the SMPX gene. For most of the combinations, the prediction between two state-of-art tools were found significant at < 0.0001 (Student's t-test). Among these nsSNPs, only four were identified as significant deleterious (P = 2.30579E-70) by most of the computational tools, where rs772775896 (A29T) was predicted as the most damaging according to the 10 algorithms (90.90%). On the other hand, rs759552778 (N19D) and rs200892029 (S71L) were considered as highly deleterious by the nine predictors (88.80%), while rs1016314772 was represented as deleterious by eight tools (77.70%) The detailed results are highlighted in Tables S1 to S9. Finally, we considered these four most deleterious nsSNPs (Table 1) for further analyis of their effects on the structural stability, sequence conservation, and functional analysis to identify the highly pathogenic variant.

Table 1. Cumulative prediction of damaging nsSNPs in SMPX
Polyphen-2
rsIDs Substituent SIFT HumDiv HumVar PROVEAN I-Mutant 3.0 PredictSNP mCSM SNPs&GO Phd-SNP VEST Mutation assessor
rs759552778 N19D 0 0.996 0.986 −4.15 −0.91 0.3 −0.39 0.309 0.1 0.0456 2.05
rs772775896 A29T 0 1 0.996 −3.5 −0.63 0.4 −1.06 0.097 0.6 0.0153 2.01
rs1016314772 K54N 0.01 0.999 0.994 −4.31 −1.18 1 −1.09 0.063 0 0.0539 0
rs200892029 S71L 0 1 0.992 −5.83 −0.58 1 −0.18 0.075 1 0.00701 0
  • The listed four nsSNPs are predicted as DAMAGING or deleterious or effect and agreed commonly to by SIFT, Polyphen-2 HumDiv, Polyphen-2 HumVar, PROVEAN, I-Mutant 3.0, PredictSNP, mCSM, and VEST tools, where bold means deleterious [SIFT (= 0), PolyPhen-2 HumDiv (>0.9), PolyPhen-2 HumVar (>0.9), PROVEAN (← 2.5), PredictSNP (0, +1>), I-Mutant 3.0 (← 0.5), mCSM (<0), SNPs&GO (>0.5), PhD-SNP (<0.5), VEST (<0.05), and Mutation assessor (>2)].
  • a Highly pathogenic SNP, which was agreed to by most of the predictors, included SIFT, Polyphen-2 HumDiv, Polyphen-2 HumVar, PROVEAN, I-Mutant 3.0, PredictSNP, mCSM, VEST, and Mutation assessor.

3.3 Analysis of the effect of deleterious nsSNPs on protein stability

To validate the prediction of I-Mutant 3.0, we performed an additional three analyses by using MUpro, I-Mutant 2.0, and iStable. All four mutations from previous analyses showed decreasing stability by providing negative DDG values with a high confidence score. Only the N19D mutation, in case of I-Mutant 2.0, however, was predicted to increase the stability. The data of those analyses can be found in Table 2.

Table 2. Validation result of protein stability change by using MUpro, I-Mutant 2.0 (Seq) and iStable
MUpro I-Mutant 2.0 (Seq) iStable
rsIDs Substitutions Delta G Stability DDG Prediction Confidence score Prediction
rs759552778 N19D −0.94 Decrease stability −0.04 Increase 0.51 Decrease
rs772775896 A29T −0.72 Decrease stability −0.49 Decrease 0.72 Decrease
rs1016314772 K54N −0.69 Decrease stability −0.82 Decrease 0.82 Decrease
rs200892029 S71L −0.23 Decrease stability −0.18 Decrease 0.67 Decrease

3.4 Analysis of structural effects on SMPX protein induced by mutations

3.4.1 Evolutionary conservancy and mechanisms analysis

Finally, evolutionary conservation analysis was performed by a ConSurf server, which calculated the phylogenetic tree by using the Bayesian classifier method of the homologous sequences. From the result of the ConSurf server, it was found that S71L, N19D, A29T, and K54N were highly conserved (Figure S1). In addition, MutPred was used for the analysis of molecular mechanisms involved in decreasing the stability of protein by the SNPs. The SNPs, which were predicted to be deleterious by the previous tools (Table 1), were used to identify the mechanistic reason for protein stability change of SMPX protein by those variants. Here, S71L showed that it decreased protein stability by loss of loop (= 0.0512) and gain of helix (P = 0.062), which was a very confident hypothesis. Another substitution, N19D, caused loss of MoRF binding (P = 0.0441) and decreased protein stability was also a very confident hypothesis. A29T also showed a very confident hypothesis by gain of phosphorylation at A29 (= 0.002). The last SNP, K54N, showed loss of ubiquitination at K54 (P = 0.0011), loss of methylation at K54 (= 0.0017), and loss of glycosylation at K54 (= 0.0284), which was a very confident hypothesis.

3.4.2 Model building and refinement for the analysis of structural effects of mutation

I-TASSER generated a large ensemble of structural conformations called decoys to select for final models. I-TASSER used the SPICKER program to cluster all the decoys based on pairwise structure similarity, and reported up to five models that correspond to the five largest structure clusters (Roy, Kucukural, & Zhang, 2010; Zhang, 2008; Zhang et al., 2016). The confidence of each model is quantitatively measured by C-score that is calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations. C-score is typically in the range of (−5, 2), where a C-score of a higher value signifies a model with a higher confidence and vice versa. TM-score and RMSD are estimated based on C-score and protein length following the correlation observed between these qualities (Wang, Virtanen, Xue, & Zhang, 2017; Zhang, 2008). The cluster size ranked the top three models. In this study, we selected the first model with the higher C-score of −4.11, which ranges between the cutoff criteria. The TM-score of the model was 0.28 ± 0.09 and the estimated TM-score was 13.1 ± 4.2 Å. The top threading templates that were used to generate the models were 5yzmB, 5yfpE, and 6gmhM, which showed greater similarity with the normalized Z-score of 1.0, 0.99, and 0.99, respectively, where Z-scores greater than 1 indicate higher quality. The predicted protein model was further validated by using RAMPAGE and ERRAT servers. RAMPAGE provided the quality of the protein model by Ramachandran plot analysis. According to RAMPAGE, the model showed 73.3% residues in the favored region while 22.1% in the allowed region. Only 4.7% residues were in the outlier region (Figure S2a). ERRAT produced the overall quality factor of the protein model of 93.75 (Figure S2b), while for the best quality of the model the value must be over 80%. Thus, the predicted model is more than good quality and very close to the best quality protein model. After that, the model was used to build mutant structures of the protein. Figure 2a exhibited the wild-type protein model highlighting substitution regions. Further, 500 ps MD refinement and energy minimization was done to examine the RMSD and SASA for the wild-type and mutant proteins. From the SASA analysis, it was found that (Figure 2b; Table 3) the wild-type and mutant protein residues showed similar residual fluctuations except for N19D and S71L, which were fluctuated most at the residue position of ∼30–40. This result indicates that mutation causes instability of SMPX protein. The RMSD analysis of the wild-type and mutant proteins showed significant deviation in their structural stability. Compared to wild-type proteins, A29T and S71L showed higher fluctuation (Figure 2a; Table 3). N19D and K54N also deviated from the native structure. Conversely, all the mutations lead to the structural drifting of protein, which is further validated by energy refinement. The total energy of the wild-type and the mutant proteins are shown in Table 3.

Details are in the caption following the image
Representation of the most deleterious mutations in the 3D protein structure of the SMPX (a). Mutation-induced structural changes in SMPX. Here, the upper panel shows (b) RMSD superimposed view of SMPX protein in a wild and mutant state, while the bottom panels (c) describe per-residue SASA of SMPX protein calculated by VMD software. Residue-wise network centralities for wild and mutant SMPX protein were also shown (d). In all cases, the green circle and the line indicate wild-type, while red indicates N19D, A29T is denoted in blue, orange indicates K54N, and violet indicates S71L [Color figure can be viewed at wileyonlinelibrary.com]
Table 3. Total energy, RMSD, SASA, and HOPE analysis result of wild-type and mutant proteins
Types Total energy (kcal/mol) RMSD (Å) SASA (Ų) Hope prediction
Wild −22725.30 ± 132.6 0.5 ± 0.001 6032.88 ± 1.58 No change
N19D −23088.66 ± 230.8 2.234 ± 0.003 6429.11 ± 2.56 The wild-type residue charge was neutral, and the mutant residue charge was negative
A29T −23088.15 ± 220.56 5.534 ± 0.002 6204.27 ± 2.40 The mutant residue is more hydrophobic than the wild-type residue
K54N −23354.86 ± 310.87 2.428 ± 0.001 6383.97 ± 3.99 The mutant residue is smaller than wild-type residue
S71L −23259.91 ± 590.87 3.515 ± 0.004 6448.58 ± 2.90 The mutant residue is more hydrophobic than the wild-type residue
  • a Compared to the nonrefined initial predicted model.

3.4.3 Residual network analysis

The structure of dynamics communities in a protein can be changed because of the mutation (Hinsen et al., 2000). Therefore, it has been revealed that highly unstable mutations tend to change the community structure in a protein more radically than mutations that are less unstable (Mishra & Jernigan, 2018; Nielsen et al., 2017). The residual network analysis was carried out by Bio3D and represented in Figure 3. Community analysis has been applied to cluster these networks into highly intracorrelated structural regions, where the size of the community is the number of amino acids that have a high degree of correlated motion (depicted by the size of the circle), while the thickness of the edges/links connecting the communities denotes the extent of correlation. From Figure 3, it can be easily observed that wild-type protein has five main consistently correlated protein sectors (or community groups), which are loosely interconnected with each other. However, the network pattern was changed as a result of mutations and all the major protein sectors were inconsistent with each other. Especially in the major clusters, the first number (blue in Figure 3) and third number (dark gray in Figure 3) were reduced in all mutations. K54N substitution caused the formation of the highest number of clusters in the SMPX protein, while A29T maintained a cluster number similar to wild-type; however, they showed an overall looser coupling. This analysis further supports that the mutations increase the flexibility of SMPX, which will influence the functional properties of the protein. To understand the functional importance of all residues of SMPX in both wild and mutant state, betweenness centrality for each node was calculated from all correlation networks. Interestingly, the major region, residues from 45 to 60, showed high betweenness in the SMPX wild-type (Figure 2d), while mutants decreased the betweenness, particularly N19D, A29T, and K54N substitutions. Furthermore, K54N and S71L represented large betweenness in the residues from 10 to 40. Because the highest betweenness reveals locations that tend to be important for controlling interdomain communication in a protein (Brown et al., 2017), this result concluded the strong deleterious effect of mutants in the SMPX protein.

Details are in the caption following the image
Community networks depicted for (a) wild, (b) N19D, (c) A29T, (d) K54N, and (e) S71L, with colored circles whose relative radius describe the number of amino acids present in a particular community. The left panel represents the communities with highly intraconnected residues in a simplified network, while the right panel shows the full residue network. The strength of intercommunity coupling is highlighted by the widths of linking lines, where all the lines are colored gray [Color figure can be viewed at wileyonlinelibrary.com]

3.4.4 Conformational difference analysis by HOPE

How conformational differences occurred because of mutations remains as a mystery. To uncover that, we conducted HOPE analysis. The result of HOPE revealed that four mutations cause amino acid changes in a meaningful manner to destabilize the SMPX protein. Each amino acid has its own specific size, charge, and hydrophobicity value. The original wild-type residue and newly introduced mutant residue often differ in these properties. Table 3 shows the effect of amino acid changes because of four mutations of the SMPX protein.

3.5 Protein–protein interaction and functional characterization

To explore the regulatory mechanism by abnormal SMPX protein, it is necessary to study its association with other interacting partner proteins. Therefore, protein–protein interaction network by STRING was utilized, which revealed a number of closed interacted protein as shown in Figure 4. STRING analysis revealed that SMPX made interactions with TRAPPC2 and TRAPPC2P1 and thus connected with large TRAPP family proteins, knows as the TRAPP complex. TRAPP complex is mainly responsible for collagen biosynthesis and transport (Cutrona, Morgan, & Simpson, 2017). Mutation in TRAPPC2 was reported to cause spondyloepiphyseal dysplasia tarda, as a result of loss of protein–protein interactions, which increase the risk of hearing loss (Chen & Chen, 2014; Gedeon et al., 2001; Mohamoud et al., 2018; Tiller & Hannig, 2015). Furthermore, SMPX interacted with LDB3, which localized in cochlear and utricular hair cells in the cochlea during ear development and maintained interactions with MYOZ2, CMYA5, ACTN2, and MYOM2, which are actin-associated proteins (Scheffer, Shen, Corey, & Chen, 2015). Unconventional myosins serve in intracellular movements. Their highly divergent tails are presumed to bind to membranous compartments, which would be moved relative to actin filaments, required for the arrangement of stereocilia in mature hair bundles (Belguith et al., 2009; Liburd et al., 2001; Shearer et al., 2009). Among the other interacting proteins of the SMPX, integrin β1-binding protein (melusin) 2 (ITGB1BP2) may play a role during maturation and/or organization of muscles cells like SMPX and may help in inner ear development (Brancaccio et al., 1999). SMPX also interacts with CEACAM16 (carcinoembryonic antigen-related cell-adhesion molecule 16); required for proper hearing, it may play a role in maintaining the integrity of the tectorial membrane (Kammerer et al., 2012; Wang et al., 2015). Because these proteins are found to have a direct association with ear development and hearing mechanisms, mutations in SMPX could lead to the disruption of functional interaction to those proteins and can contribute to developing hearing loss. Therefore, we further considered these four functional partners, including TRAPPC2, ITGB1BP2, CEACAM16, and LDB3, for detailed functional studies (Figure 5a). The results show that the four mutations can disrupt protein–protein interactions as a result of amino acid change (Figure 5b). The binding energy variation of the wild-type and mutants to the interacting partners was calculated, where significant variation in binding energy was observed for all mutations compared to wild-type. Among all the mutations, A29T decreased the binding energy SMPX to all partner proteins, where the deviation was >10% in all cases. To gain more detailed insights into the deleterious effect of mutations in protein–protein interactions, hot spots or physical contact sites in the protein–protein complexes were identified and tabulated in Table S10. Interestingly, residue A29 involved in the binding site of SMPX and participated to make complex with TRAPPC2, ITGB1BP2, and CEACAM16. In contrast, K54 and S71 residues were only found as a hot spot in the SMPX–LDB3 complex, although significant variation in binding energy was not seen in the case of S71L. The residue N19 was seen to present in the CEACAM16 binding site and its mutation, N19D, reduced the 20.22 % of binding energy compared to the control. These results further indicated the pathological feature of identified nsSNPs in SMPX, where A29T is the most deleterious.

Details are in the caption following the image
Protein–protein interaction network analysis of SMPX. This result shows the interacting partners of the SMPX protein by nodes and edges. The network was generated by STRING with a high confidence level (0.700). The highlighted regions indicate the large cluster with functional proteins [Color figure can be viewed at wileyonlinelibrary.com]
Details are in the caption following the image
Interaction between wild and mutated SMPX with functional partners including TRAPPC2, ITG1BP2, CEACAM16, and LDB3 (a). In the TRAPPC2 panel, a light pink cartoon ribbon structure represents wild and mutated SMPX in every case, where in other panels including ITG1BP2, CEACAM16, LDB3, wild, and mutated SMPX are rendered in a light blue cartoon model. The hot spots are highlighted as dark colored bars (b). Binding free energy of protein–protein complexes, containing wild and mutated SMPX. (Stars) +10% variation in energy compared to wild-type [Color figure can be viewed at wileyonlinelibrary.com]

4 DISCUSSION

Recently, identification of SNPs by the in silico method is very popular in uncovering the mutations that are involved in impairing protein structure and function (Agrahari et al., 2019b; Agrahari, George, Siva, Magesh, & Zayed, 2019a; Dash, Junaid, Mitra, Arifuzzaman, & Hosen, 2019). The nsSNPs of the SMPX gene has not been considered to date to observe their functional and structural effects on the atomic level. To address this issue, the present study therefore used various in silico tools for the screening of deleterious nsSNPs from the natural variants. In this study, various sequence-based approaches were conducted, because they are adventitious over structure-based approaches. Structure-based methods are limited to a known 3D structure; on the other hand, sequence-based approaches can be implemented in proteins with unknown 3D structures (Doss & Rajith, 2012; Marín-Martín, Soler-Rivas, Martín-Hernández, & Rodriguez-Casado, 2014). A combination of multiple predictors showed better predictions in many recent reports for classifying deleterious nsSNPs in SLX4/FANCP (Landwehr et al., 2011), MACC1 (Muendlein et al., 2014), NY-BR-1 (Kosaloglu et al., 2016), BARD1 (Alshatwi, Hasan, Syed, Shafi, & Grace, 2012), MBL2 (Kalia, Sharma, Kaur, Kamboj, & Singh, 2016), BCL11A (Abdulazeez et al., 2019), HBA1 (AbdulAzeez & Borgio, 2016), AHSP (Borgio et al., 2016), MTHFR (Karimian & Hosseinzadeh Colagar, 2018), MKRN3 (Neocleous et al., 2016), and PALB2 (Phuah et al., 2013). From the dbSNP database of NCBI, we found 26 SNPs as missense and three SNPs as nonsense for SMPX. The retrieved SNPs were subsequently subjected to multiple SNP predictors to enhance the accuracy of effect predictions (Brown & Bishop, 2017; Wu & Jiang, 2013). Usually, a minimum of four or five of these tools should be considered increasing the consensus on the effect of SNPs; however, 11 different computational algorithms were initially employed to identify the most pathogenic nsSNPs from SMPX (Brown & Bishop, 2017). Only four SNPs were found as having the most in common with these tools, including N19D, A29T, K54N, and S71L, thus they were the most deleterious, where the cumulative scoring described A29T as highly pathogenic. All four SNPs were also found to be deleterious in common with MUpro, I-Mutant 2.0 and iStable servers. Furthermore, SNPs that were conserved across the evolutionary perspective are more important than those that were not conserved, because they are structurally and functionally important for the protein (Ashkenazy et al., 2010). For this reason, the ConSurf server was used for evolutionary conservation analysis. This analysis showed that N19, A29, K54, and S71were highly conserved in the SMPX. It should be notified that the residues located in the conserved region are known to play major roles in the biological process, including protein–protein interactions (Arshad, Bhatti, & John, 2018), and nsSNPs at this region are regarded as massively damaging (Doniger et al., 2008; Miller & Kumar, 2001). The above analysis thus concludes that these nsSNPs might cause maximum damage to the SMPX protein by affecting its stability. However, the 3D structure of the protein plays an important role in understanding the overall effects of SNPs on protein function. Thus, protein modeling was done by the I-TASSER server following the refinement of the model to examine the mutational effect on protein. The RMSD, SASA, and energy profiles show that mutations change the stability of the protein; the impact was especially higher in A29T substitution. The residual network analysis revealed that the interdomain communication within the protein was changed by amino acid substitutions, which may influence binding with other macromolecules. Accordingly, mutations induced by functional changes were also investigated by protein–protein interaction modeling and binding energy calculation. In agreement with the previous findings from different bioinformatics and structural analyses, A29T substitution in SMPX was also involved to produce highly deleterious effects in interactions with other functional partners by reducing binding energies. The residues that present in the hot regions at the protein–protein interface influence the protein–protein interaction stability as well the cooperative behavioral properties (Chen, Willick, Ruckel, & Floriano, 2015; Tuncbag, Gursoy, Nussinov, & Keskin, 2011). Among them, the A29 residues participated directly in the hot-spot-binding region, and the A29T substitutions induced structural changes, thus affecting the SMPX with TRAPPC2, ITGB1BP2, and CEACAM16 proteins. Eventually, A29T was not present at the interacting surface of the LDB3–SMPX complex; however, reducing binding energy to 38.55% signifies the results from RMSD analysis, and concludes the deleterious effect of structural changes of SMPX induced by A29T. As a corollary, this result suggests that mutation impaired the SMPX association with other functional proteins, which can contribute to the loss of cytoskeletal integrity of the inner ear muscle.

5 CONCLUSION

Our study overviews critically all the deleterious and neutral SNPs that were involved in the SMPX gene. Using consensus prediction from 11 different deleterious SNP predictors, four pathogenic nsSNPs, including S71L, N19D, A29T, and K54N, were identified initially, from which A29T caused significant structural and residual changes in the SMPX protein, thereby reducing binding energies with functional partners, including TRAPPC2, ITGB1BP2, LDB3, and CEACAM16. The deleterious effect of rs772775896 (A29T) in SMPX can be a cause of complete protein–protein interaction failure and may contribute to hearing loss. Therefore, future experiments can be designed by considering this in silico data to analyze its biological context.

CONFLICTS OF INTEREST

Authors state no conflict of interest. All authors have read the journal's publication ethics and publication malpractice statement available at the journal's website and hereby confirm that they comply with all its parts applicable to the present scientific work.

AUTHOR CONTRIBUTIONS

R. Dash and N.A. conceived the idea and designed the experimental work. M.A., R. Das, S.M., and A.H. carried out the experiment. M.A., S.M. and R. Das analyzed the data and wrote the first draft of the manuscript. R. Dash made critical revisions. M.A., A.H., and N.A. reviewed the final manuscript. All authors revised and approved the final manuscript.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.