Volume 2024, Issue 1 6893109
Research Article
Open Access

In Silico Identification and Functional Impact of Deleterious Nonsynonymous Single-Nucleotide Polymorphisms (nsSNPs) in Type 2 Diabetes–Associated Genes in South Asian Populations

Md. Hafizur Rahman

Md. Hafizur Rahman

Department of Agro Product Processing Technology , Jashore University of Science and Technology , Jashore , 7408 , Bangladesh , just.edu.bd

Department of Quality Control and Safety Management , Faculty of Food Sciences and Safety , Khulna Agricultural University , Khulna , 9100 , Bangladesh

Search for more papers by this author
Md. Numan Islam

Md. Numan Islam

Department of Food Science and Technology , University of Nebraska Lincoln , St. Lincoln , Nebraska, USA , unl.edu

Search for more papers by this author
Md. Golam Rabby

Md. Golam Rabby

Department of Nutrition and Food Technology , Jashore University of Science and Technology , Jashore , 7408 , Bangladesh , just.edu.bd

Search for more papers by this author
Salina Shaheen Parul

Salina Shaheen Parul

Department of Biochemistry , Khulna Medical College , Khulna , 9000 , Bangladesh

Search for more papers by this author
Md. Mahmudul Hasan

Md. Mahmudul Hasan

Department of Nutrition and Food Technology , Jashore University of Science and Technology , Jashore , 7408 , Bangladesh , just.edu.bd

Search for more papers by this author
Mrityunjoy Biswas

Corresponding Author

Mrityunjoy Biswas

Department of Agro Product Processing Technology , Jashore University of Science and Technology , Jashore , 7408 , Bangladesh , just.edu.bd

Search for more papers by this author
First published: 19 December 2024
Academic Editor: Xiaoye Jin

Abstract

This study explores the impact of nonsynonymous single-nucleotide polymorphisms (nsSNPs) on type 2 diabetes (T2D). The nsSNPs are genetic variations that alter amino acids within proteins, affecting protein structure and function. This study investigated seven candidate genes associated with T2D pathogenesis from genome-wide association studies (GWASs) catalog datasets. Subsequently, six mutation-prediction tools were employed to identify the most harmful nsSNPs within these candidate genes. Further analysis involved evaluating protein evolutionary conservation using the ConSurf server and assessing protein stability with I-Mutant and MUpro. Functional and structural effects were predicted using MutPred2, Project HOPE, and FoldAmyloid tools. We obtained 42 of the most deleterious nsSNPs from identified candidate genes. Among these, 38 are located in highly conserved residues with a conservative score of 7–9. Furthermore, 20 of these conserved nsSNPs are found to decrease protein stability, with 18 of them classified as pathogenic mutations. These mutations can either reduce or increase protein size and can alter the charge and hydrophobic characteristics of the affected proteins. In addition, eight mutants from four genes were identified in amyloidogenic regions, suggesting a potential link to protein aggregation. These findings provide valuable insights into the physicochemical properties and structural changes associated with these deleterious nsSNPs. The study concludes that the distinctive physicochemical properties and significant structural changes of the identified nsSNPs suggest valuable insights for future research. Understanding these variants through large-scale studies may pave the way for developing therapeutic interventions targeting genetic variations, ultimately improving our understanding of T2D pathogenesis and treatment.

1. Introduction

Type 2 diabetes (T2D) is a chronic and diverse array of multifaceted metabolic disorders. It is marked by elevated blood sugar levels due to reduced insulin production and inadequate insulin function originating from pancreatic β-cells [1]. T2D seems to have a higher prevalence among South Asian populations (SAPs), including Bangladeshis, Sri Lankans, Indians, and Bhutanese. By the year 2030, an estimated 120.9 million individuals in South Asia are anticipated to be affected by T2D [2]. T2D is characterized as macro- and microvascular complications, while cardio- and cerebrovascular diseases are macrovascular, and neuropathy, retinopathy, and nephropathy are microvascular complications [3]. The etiology of T2D is influenced by several modifiable (physical inactivity, body mass index, and diet) and nonmodifiable (age, ethnicity, genetic predisposition, family history, and comorbid diseases) risk factors [4]. T2D pathogenesis is caused by both environmental and genetic factors [5].

Genetic risk factors have a significant impact on the prognosis of human disorders [6]. Understanding disease pathogenesis and devising a better treatment strategy requires a thorough understanding of the genetic background and responsive gene identification of the disease [7]. Genetic research has substantial potential for predicting a person’s disease risk, investigating disease molecular pathways, and selecting medical therapy based on individual biology. Modern high-throughput genotyping technology, such as genome-wide association studies (GWASs), is a powerful genetic research tool for identifying the etiology and characterizing the functions of genetically complex diseases. The GWAS establishes links between DNA sequence variation and a disease or trait of biomedical significance [8]. The GWASs among T2D datasets revealed an unprecedented opportunity to determine T2D pathogenesis heritability. It identifies population-specific genetic influences and allelic heterogeneity for T2D [9]. It has revealed numerous possible single-nucleotide polymorphisms (SNPs) that significantly contribute to the genetic foundation of T2D [10].

SNPs are typically the predominant type of genetic variation, holding significant importance in characterizing the structure of the human genome. They have emerged as a primary area of interest for comprehending the genetic underpinnings of diseases [11]. SNPs situated at different positions have been strongly associated with increased susceptibility to various human diseases. As a result, researchers in population genetics have placed a major emphasis on pinpointing and thoroughly understanding the functional aspects of these SNPs [12]. Understanding an individual’s SNP genotype has the potential to establish a foundation for determining their susceptibility to diseases and identifying the most suitable treatment path [13]. SNPs are commonly classified into synonymous and nonsynonymous SNPs (nsSNPs) [14]. Notably, nsSNPs, also known as deleterious or missense variants, hold significant importance due to a single alteration in a DNA base within the coding region. It triggers a substitution of amino acid residues in the protein structure leading to the diversification of protein functions within the human body [15]. The nsSNPs represent the predominant and most impactful category of genetic variations. They frequently correlate with hereditary modifications in gene regulation, amino acid constituents, binding factors for transcription, the stability of mRNA transcription, and the dependability of cellular and tissue structures [16]. Similarly, nsSNPs exert a direct influence on the functional roles of proteins in the transmission of signals for hormonal, visual, and interconnected stimuli [17].

Akhter et al. performed a comprehensive in silico bioinformatics analysis of the CCR6 gene, identifying deleterious SNPs linked to rheumatoid arthritis, psoriasis, lupus nephritis, systemic sclerosis, and other autoimmune diseases [18]. Another in silico study reported numerous nsSNPs in the POT1 gene associated with various human diseases, including cancer [19]. Halder et al. used computational approaches to investigate SNPs in CFL1, finding mutations linked to neurodegenerative diseases, neuronal migration disorders, and neural tube closure defects [11].

This study aimed to perform a comprehensive in silico analysis of nsSNPs in identified T2D-associated genes from SAP. Considering the abovementioned facts, we have conducted extensive screening for most deleterious nsSNPs from candidate genes to identify the possible pathogenic SNPs. Thus, the computational identification and functional impact of deleterious nsSNPs in T2D pathogenesis would be a significant advancement of the T2D research in SAP.

2. Methodology

2.1. Identification of Genes Associated With T2D Pathogenesis

The publicly available GWAS catalog database (https://www.ebi.ac.uk/gwas/) was used for comprehensive screening to identify the candidate genes associated with T2D pathogenesis in SAPs. The search keyword T2D was used to identify the responsible genes for T2D pathogenesis in the SAPs, where the genome-wide significance of p < 5 × 10−30 was considered as the most significant value [20].

2.1.1. Gene Ontology (GO) Analysis

GO enrichment analysis was performed in the publicly available ShinyGO v0.75 database (http://bioinformatics.sdstate.edu/go75/). ShinyGO v0.75 predicts the three GO categories, including biological processes, cellular components, and molecular functions of identified candidate genes. The 10 most significantly enriched pathways were automatically generated based on the adjusted p value criterion (FDR) of 0.05 [21].

2.2. KEGG Pathway Enrichment Analysis

The KEGG pathway enrichment analysis was investigated using the publicly accessible Web-Gestalt (WEB-based Gene SeT AnaLysis Toolkit) database, with an false discovery rate (FDR) value of 0.05. The over representation analysis (ORA) method was adjusted for the pathway enrichment analysis in the WebGestalt database, where Homo sapiens and the gene symbol ID were selected as the reference genome and the gene ID, respectively [22].

2.3. Data Retrieval and SNP Characterization

FASTA sequences of the identified genes (CDKAL1, HHEX, HNF1B, IGF2BP2, PAX4, SLC30A8, and TCF7L2) were obtained from the UniProt database (https://www.uniprot.org/) with the following UniProt IDs (Q5VV42, Q03014, P35680, Q9Y6M1, O43316, Q8IWU4, and Q9NQB0, respectively). Furthermore, dbSNP (https://www.ncbi.nlm.nih.gov/snp/) of the National Center for Biotechnology Information (NCBI) database was used for SNP characterization. From the dbSNP database, the details of nsSNPs were identified, including SNP ID, position, protein accession number, and residue change for the identified responsible genes [15].

2.4. Prediction of Most Deleterious nsSNPs Associated With T2D

The retrieved nsSNPs were investigated using six different in silico SNP prediction tools to identify the most deleterious nsSNPs.

The Sorting Intolerant from Tolerant (SIFT) server (https://sift.bii.astar.edu.sg/www/SIFT_dbSNP.html) uses the physiochemical properties of orthologous and paralogous proteins to predict amino acid substitutions. The substitutions are categorized as deleterious if the probability score is < 0.05, while the substitutions are considered tolerated if the probability score is ≥ 0.05. The SIFT-predicted deleterious SNPs were taken for the next screening.

The Polymorphism Phenotyping v2 (Polyphen-2) server (http://genetics.bwh.harvard.edu/pph2/) foretells the probability of protein damage. Position-specific independent count (PSIC) scores are computed by using both structural and sequencing properties of SNP variations. The functional and structural consequences of the SNP variant are revealed by the variations in PSIC scores, which also predict the probability of damage. The probability scores are divided into benign (0.00–0.45), possibly damaging (0.45–0.95), and probably damaging (0.95–1) categories. The SNPs identified as possibly and probably damaging were chosen for further screening.

The PANTHER server (http://pantherdb.org/tools/csnpScoreForm.jsp) relies on evolutionary preservation time to predict deleterious SNPs. This server estimates the probability of the SNP variation based on the HMM algorithm. The SNPs annotated as probably damaging were selected for further analysis.

Then, the PHD-SNP server (https://snps.biofold.org/phd-snp/phd-snp.html) was used for the identification of disease-causing SNPs. The selected SNPs that were expected to be linked with the disease were considered as the most damaging SNPs and were taken into consideration for investigations in the following section.

The SNAP (https://snps.biofold.org/meta-snp/) and Meta-SNP (https://snps.biofold.org/meta-snp/) servers were also used for the prediction of disease-causing SNPs.

Finally, the most deleterious SNPs were manually selected after the six in silico comprehensive prediction tools, and these nsSNPs were further considered for downstream analyses [13, 23, 24].

2.5. Identification of Protein Evolutionary Conservation due to Diabetes-Associated SNPs

The ConSurf web server (https://consurf.tau.ac.il/) was used to conduct the protein evolutionary conservation analysis. This bioinformatics tool was frequently used to evaluate the evolutionary history of the amino acids in macromolecules and identify crucial regions for both structural and functional properties. The server calculates the conservation score using multiple sequence alignments of the proteins with 50 homologous sequences, which range from 1 to 9. In addition, in order to maintain highly accurate predictions, it applies the empirical Bayesian technique. The conservation scores were divided into three categories including less conserved (scores 1–4), moderately conserved (scores 5-6), and highly conserved (scores 7–9). The nsSNPs with highly conserved values were chosen for further investigation [25].

2.6. Exploration of Protein Stability Changes due to Diabetes-Associated SNPs

Protein stability change analysis was investigated by using publicly available I-Mutant (https://folding.biofold.org/cgi-bin/i-mutant2.0) and MUpro (https://mupro.proteomics.ics.uci.edu/) servers. I-Mutant is used to evaluate the effects of nsSNPs on protein stability. This server estimates the level of protein destabilization and predicts the energy change value (delta delta G, DDG), which determines whether stability will increase or decrease. When the DDG value was less than 0, the stability of protein was decreased, and if the value was higher than 0, the stability was increased.

In addition, MUpro uses two (support vector machines and neural networks) machine-learning methods to predict the effect of amino acid mutations on protein stability. These two machine-learning techniques predicted the result from −1 to 1 as the prediction reliability [26, 27].

2.7. Identification of Molecular Effects on Structural and Functional Properties of Proteins

MutPred v2.0 (http://mutpred.mutdb.org/) is a publicly available bioinformatics tool that was used to screen disease-associated or neutral amino acid substitutions. The amino acid substitutions in the MutPred v2.0 tool are characterized as either neutral or disease-associated based on three types of attributes, including the structure and dynamics of proteins, proteins evolutionary conservation, and mutation-induced changes at molecular and atomic levels of proteins. Multiple sequence alignments and levels of sequence conservation were used to annotate the mutational impacts of the protein. When the output score was less than 0.5, it was regarded as benign, and when it was greater than 0.5, it was regarded as pathogenic [28].

2.8. Prediction of Mutational Impacts of High-Risk nsSNPs on Protein Structure

Project HOPE (https://www3.cmbi.umcn.nl/hope/) is a bioinformatics tool for a mutant analysis that was used to investigate the molecular and structural implications of high-risk nsSNPs on the protein structure. This server predicts structural changes in proteins based on point mutations in a protein sequence. Seven identified gene sequences and associated nsSNPs were submitted to HOPE. HOPE predicts 3D structural information and explains functional changes in the mutated protein [29].

2.9. Prediction of Amyloidogenic Regions

The publicly available FoldAmyloid server (http://bioinfo.protres.ru/fold-amyloid/) was used to predict the amyloidogenic regions that are responsible for the aggregation of protein sequences. The missense variants lead to the development of amyloid aggregates, which in turn disrupt the functioning of the protein. The aggregation of misfolded proteins is responsible for a range of diseases [30].

3. Results

3.1. Genes Associated With T2D Pathogenesis

We have selected eight datasets (GCST001809, GCST001759, GCST001213, GCST010553, GCST90093109, GCST90134620, GCST90132186, and GCST90245849) from the GWAS catalog that revealed association with T2D pathogenesis in the SAPs. After analyzing these datasets, we have identified seven (CDKAL1, HHEX, HNF1B, IGF2BP2, PAX4, SLC30A8, and TCF7L2) genes that are significantly associated with T2D pathogenesis in SAP. The identified candidate genes were taken for further downstream exploration (Table 1).

Table 1. Identification of genes associated with T2D.
Gene ID Uniport ID Location p value RAF
CDKAL1 Q5VV42 6p22.3 7 × 10−87 0.5769
HHEX Q03014 10q23.33 5 × 10−40 0.8696
HNF1B P35680 17q12 5 × 10−35 0.6742
IGF2BP2 Q9Y6M1 3q27.2 4 × 10−44 0.6606
PAX4 O43316 7q32.1 1 × 10−74 0.9085
SLC30A8 Q8IWU4 8q24.11 1 × 10−30 0.5931
TCF7L2 Q9NQB0 10q25.2q25.3 3 × 10−35 0.3

3.1.1. GO

GO serves as a structural framework for categorizing the functions of genes within molecular processes, biological functions, and cellular components. Subsequently, the 10 most noteworthy GOs were chosen (Figure 1). The leading biological process involved the control of transcription through the localization of transcription factors and the biosynthesis of heparan sulfate proteoglycan. Another biological occurrence pertained to the promotion of peptide secretion, reactions to glucose and hexose, and the management of peptide secretion. In addition, it involved RNA localization, the inhibition of transcription by RNA polymerase II, and the suppression of macromolecule biosynthesis.

Details are in the caption following the image
Gene ontology analysis: (a) biological process, (b) molecular function, and (c) cellular component.
Details are in the caption following the image
Gene ontology analysis: (a) biological process, (b) molecular function, and (c) cellular component.
Details are in the caption following the image
Gene ontology analysis: (a) biological process, (b) molecular function, and (c) cellular component.

Likewise, the most predominant molecular function was the ability to act as a DNA-binding transcription repressor and the role of RNA polymerase II. In addition, it exhibited other molecular functions, including binding to DNA sequences in the cis-regulatory region of RNA polymerase II, functioning as a DNA-binding transcription factor, binding to nucleic acids in transcription regulatory regions, and interacting with double-stranded DNA. Moreover, considering the evaluation of cellular components, the primary cellular entity was the complex formed by catenin and beta-catenin-TCF7L2, as well as the complex involving beta-catenin-TCF. Other relevant cellular components included the rough endoplasmic reticulum, complexes between proteins and DNA, PML bodies, complexes governing RNA polymerase II transcription, membranes of transport vesicles, chromatin structures, and chromosomes.

3.1.2. KEGG Pathway Enrichment

The KEGG pathway enrichment analysis was conducted considering the most statistically significant enrichment level and maintained a FDR of less than 0.05. This analysis revealed that the pathogenic genes identified in this study exert influence over multiple metabolic pathways. Specifically, this study focused on the top 10 KEGG pathways that exhibited the strongest regulation by these pathogenic genes, as illustrated in Supporting Figure 1. Among these pathways, the most prominently enriched pathways included mature-onset diabetes in the youth. In addition, this investigation highlighted the involvement of these pathogenic genes in other pathways such as thyroid cancer, Vibrio cholerae infection, endometrial cancer, basal cell carcinoma, acute myeloid leukemia, adherens junction, arrhythmogenic right ventricular cardiomyopathy (ARVC), gastric acid secretion, and colorectal cancer (Table 2).

Table 2. KEGG pathway enrichment analysis.
Pathway ID Pathway name Ratio of enrichment p value FDR
hsa04950 Maturity onset diabetes of the young 167.79 4.04E − 07 1.32E − 04
hsa05216 Thyroid cancer 39.303 2.52E − 02 1
hsa05110 Vibrio cholerae infection 29.084 3.39E − 02 1
hsa05213 Endometrial cancer 25.072 0.039264 1
hsa05217 Basal cell carcinoma 23.083 0.04259 1
hsa05221 Acute myeloid leukemia 22.033 0.044581 1
hsa04520 Adherens junction 20.197 0.048554 1
hsa05412 Arrhythmogenic right ventricular cardiomyopathy (ARVC) 20.197 0.048554 1
hsa04971 Gastric acid secretion 19.389 0.050535 1
hsa05210 Colorectal cancer 16.909 0.057772 1

3.2. SNPs Characterization

Based on the identified seven candidate genes, this study retrieved a total of 3760 missense SNPs (based on database records up to June 2023). Specifically, 254662 SNPs were found for the CDKAL1 gene, of which 524 were characterized as missense variants.

Parallelly, for the HHEX gene, 3314 SNPs were documented, encompassing 226 were categorized as missense variants. Similarly, the HNF1B gene boasted 23888 SNPs, with 516 of them classified as missense variants. In the case of IGF2BP2, a substantial 70322 SNPs were identified, where 464 were identified as missense variants. Shifting focus to the PAX4 gene, 4311 SNPs were documented and 380 were noted as missense variants. For the SLC30A8 gene, a total of 82339 SNPs were recorded, of which 336 were categorized as missense variants. Finally, delving into the TCF7L2 gene revealed 86472 SNPs, out of which 1314 were flagged as missense variants.

This study focused on the nonsynonymous coding SNPs, specifically missense variations. This targeted approach aimed to decipher the impact of these variations on the phenotypic expression of the corresponding proteins. Consequently, a curated set of 3760 missense SNPs emerged as the focal point for subsequent in-depth analysis.

3.3. Prediction of Most Deleterious nsSNPs Associated With T2D

A comprehensive assessment of 3,760 identified missense SNPs originating from seven genes (CDKAL1, HHEX, HNF1B, IGF2BP2, PAX4, SLC30A8, and TCF7L2) was conducted through the utilization of missense prediction tools. In the initial SIFT prediction analysis, it was determined that 512 out of the total 3,760 missense SNPs were predicted to be deleterious. Specifically, 47 missense SNPs were identified as deleterious for the CDKAL1 gene, four for the HHEX gene, 62 for the HNF1B gene, 43 for the IGF2BP2 gene, 118 for the PAX4 gene, 90 for the SLC30A8 gene, and 148 for the TCF7L2 gene. Subsequently, the outcomes of the SIFT analysis were utilized in the PolyPhen-2 prediction analysis. Among the 512 missense SNPs, 107 were categorized as probably or possibly damaging. Specifically, there were 13 for CDKAL1, one for HHEX, 15 for HNF1B, nine for IGF2BP2, 27 for PAX4, 21 for SLC30A8, and 21 for TCF7L2 genes. Consequently, a combined investigation using four in silico SNP prediction tools (PANTHER, Phd-SNP, SNAP, and META-SNP) led to the identification of 42 SNPs as the most deleterious, contributing to disease causation. This selection was refined from the initial 107 probably and possibly damaging SNPs. The distribution of these 42 disease-causing SNPs among genes was as follows: nine in CDKAL1, one in HHEX, six in HNF1B, three in IGF2BP2, 11 in PAX4, seven in SLC30A8, and five in TCF7L2. These 42 selected SNPs were chosen for further analysis and are highlighted in Table 3.

Table 3. Identification of most deleterious nsSNPs by prediction software.
Gene RS ID Amino acid change SIFT Score PolyPhen Score PANTHER Score PHD-SNP Score SNAP Score META-SNP Score
CDKAL1 rs4710963 R397S DE 0.001 PD 1 D 0.886 D 0.833 D 0.71 D 0.769
rs112984088 C221R DE 0 PD 1 D 0.998 D 0.931 D 0.91 D 0.815
rs143106927 D79V DE 0 PD 1 D 0.924 D 0.902 D 0.81 D 0.885
rs150177925 Y297C DE 0.002 PD 0.987 D 0.961 D 0.669 D 0.605 D 0.801
rs200195852 T70A DE 0.003 PD 1 D 0.809 D 0.618 D 0.715 D 0.719
rs202056788 N76S DE 0.036 PD 1 D 0.84 D 0.753 D 0.75 D 0.771
rs368733380 P197L DE 0.001 PD 0.997 D 0.534 D 0.784 D 0.625 D 0.651
rs374932945 Y459C DE 0.001 PD 1 D 0.952 D 0.712 D 0.775 D 0.881
rs377386894 G72C DE 0 PD 1 D 0.989 D 0.908 D 0.8 D 0.905
  
HHEX rs17851141 A171T DE 0.003 PD 1 D 0.814 D 0.664 D 0.73 D 0.724
  
HNF1B rs374126219 G144S DE 0.029 PD 1 D 0.776 D 0.687 D 0.63 D 0.671
rs121918674 S148W DE 0 PD 1 D 0.955 D 0.759 D 0.79 D 0.819
rs121918675 R165H DE 0 PD 1 D 0.847 D 0.697 D 0.765 D 0.775
rs193922490 W171R DE 0 PD 1 D 0.895 D 0.82 D 0.78 D 0.814
rs193922491 R235W DE 0.001 PD 1 D 0.952 D 0.542 D 0.83 D 0.75
rs371467412 G408R DE 0.001 PD 1 D 0.666 D 0.679 D 0.74 D 0.711
  
IGF2BP2 rs6787209 T126P DE 0.014 PD 1 D 0.559 D 0.829 D 0.675 D 0.705
rs113792141 S311P DE 0 PD 1 D 0.649 D 0.774 D 0.695 D 0.694
rs143252812 Y154C DE 0.005 PD 1 D 0.862 D 0.702 D 0.725 D 0.755
  
PAX4 rs35155575 R45W DE 0 PD 1 D 0.986 D 0.928 D 0.825 D 0.932
rs115887120 R39Q DE 0.001 PD 1 D 0.925 D 0.902 D 0.805 D 0.899
rs121917718 R172W DE 0 PD 1 D 0.86 D 0.855 D 0.77 D 0.849
rs147279315 R45Q DE 0.037 PD 1 D 0.917 D 0.879 D 0.795 D 0.843
rs149708455 R20W DE 0 PD 1 D 0.981 D 0.886 D 0.775 D 0.908
rs369459316 R226C DE 0 PD 1 D 0.984 D 0.842 D 0.755 D 0.867
rs370095957 R60H DE 0.001 PD 1 D 0.955 D 0.926 D 0.705 D 0.842
rs145468905 G65D DE 0 PD 1 D 0.946 D 0.959 D 0.735 D 0.831
rs372497946 R227W DE 0 PD 1 D 0.86 D 0.874 D 0.73 D 0.835
rs373939873 R63C DE 0 PD 1 D 0.788 D 0.792 D 0.615 D 0.792
rs375391009 I43V DE 0.005 PD 0.902 D 0.599 D 0.717 D 0.635 D 0.682
  
SLC30A8 rs73317647 R165C DE 0 PD 1 D 0.994 D 0.891 D 0.73 D 0.867
rs139489847 G296R DE 0.05 PD 1 D 0.903 D 0.82 D 0.685 D 0.754
rs140404252 L74R DE 0 PD 1 D 0.979 D 0.9 D 0.785 D 0.917
rs145677283 R165H DE 0.001 PD 1 D 0.937 D 0.851 D 0.58 D 0.805
rs201697165 S182C DE 0.001 PD 1 D 0.976 D 0.762 D 0.63 D 0.858
rs369783320 D248N DE 0.005 PD 1 D 0.98 D 0.888 D 0.78 D 0.892
rs371902065 Y244C DE 0.001 PD 0.938 D 0.647 D D 0.59 D 0.77
  
TCF7L2 rs13458 P178Q DE 0.002 PD 1 D 0.528 D 0.583 D 0.615 D 0.647
rs3197486 P202H DE 0.001 PD 1 D 0.933 D 0.825 D 0.505 D 0.762
rs148523217 P247T DE 0.021 PD 0.987 D 0.788 D 0.694 D 0.715 D 0.54
rs184454375 E53K DE 0.002 PD 0.999 D 0.764 D 0.793 D 0.68 D 0.613
rs188153157 D10G DE 0.019 PD 0.999 D 0.523 D 0.758 D 0.73 D 0.723
  • Abbreviations: D∗, damaging; DE∗, deleterious; PD∗, probably damaging.

3.3.1. Protein Evolutionary Conservation

The ConSurf online server was employed to conduct a comprehensive conservation analysis of the sequences pertaining to the candidate proteins (CDKAL1, HHEX, HNF1B, IGF2BP2, PAX4, SLC30A8, and TCF7L2) (Supporting Figure 2). Notably, mutations occurring within the conserved regions of a protein tend to exert a more detrimental effect on its functionality as compared to those found in nonconserved regions. The ConSurf analysis revealed the presence of 38 missense SNPs with scores ranging from seven to nine in conserved regions among the total 42 missense SNPs. Among these 38 missense SNPs, nine were identified within the conserved regions of CDKAL1, one in HHEX, six in HNF1B, one in IGF2BP2, 11 in PAX4, six in SLC30A8, and four in TCF7L2 proteins. Given the identification of notably detrimental mutations occurring within highly conserved regions, the 38 residues were designated as markedly conserved with substantial potential for causing severe functional disruptions. Consequently, these residues were considered for subsequent comprehensive analysis (Supporting Figure 2).

3.3.2. Protein Stability Change

The stability changes of the candidate proteins were assessed using the I-Mutant and MUpro online tools. These tools were utilized to evaluate the impact of 38 missense SNPs that had been previously identified as deleterious SNPs in the analysis (Table 4).

Table 4. Prediction of protein stability change using I-MUTANT and MUpro.
Gene ID Uploaded variation AAA position Imutant-2 MUpro
DDG DDG SVM NN
CDKAL1 rs4710963 R397S −2.7 Decrease −0.80175806 Decrease −0.59896618 Decrease −0.9972 Decrease
rs200195852 T70A −1.74 Decrease −0.82093965 Decrease −0.96288568 Decrease −0.9909 Decrease
rs368733380 P197L −0.82 Decrease −0.97089976 Decrease −1 Decrease −0.9921 Decrease
rs374932945 Y459C −0.08 Decrease −0.60590127 Decrease −0.4927 Decrease −0.50852 Decrease
  
HHEX rs17851141 A171T −0.19 Decrease −1.4345152 Decrease −0.368285 Decrease −0.87441 Decrease
  
HNF1B rs374126219 G144S −0.87 Decrease −0.63145956 Decrease −0.74875032 Decrease −0.710697 Decrease
rs193922490 W171R −1.4 Decrease −1.5512041 Decrease −0.65903684 Decrease −0.915320 Decrease
  
IGF2BP2 rs113792141 S311P −1.52 Decrease −0.947939 Decrease −0.4430988 Decrease −0.59909 Decrease
  
PAX4 rs115887120 R39Q −1.44 Decrease −0.9943113 Decrease −0.4038763 Decrease −0.99978 Decrease
rs147279315 R45Q −1.97 Decrease −0.7247240 Decrease −0.4754671 Decrease −0.69038 Decrease
rs369459316 R226C −0.6 Decrease −0.7006551 Decrease −0.252720 Decrease −0.67869 Decrease
rs370095957 R60H −1.05 Decrease −1.1087813 Decrease −1 Decrease −0.99987 Decrease
rs375391009 I43V −1.21 Decrease −0.4845391 Decrease −0.781356 Decrease −0.702366 Decrease
  
SLC30A8 rs73317647 R165C −1.72 Decrease −0.725402 Decrease −0.76906 Decrease −0.87403 Decrease
rs140404252 L74R −0.91 Decrease −1.204816 Decrease −1 Decrease −0.88912 Decrease
rs145677283 R165H −2.03 Decrease −1.021872 Decrease −0.813296 Decrease −0.96817 Decrease
rs369783320 D248N −0.4 Decrease −0.9050234 Decrease −1 Decrease −0.908598 Decrease
  
TCF7L2 rs3197486 P202H −1.95 Decrease −0.86616684 Decrease −0.85439785 Decrease −0.716442 Decrease
rs148523217 P247T −2.76 Decrease −1.0408255 Decrease −1 Decrease −0.86035 Decrease
rs188153157 D10G −0.13 Decrease −1.5126538 Decrease −0.6594547 Decrease −0.851565 Decrease

Upon a thorough exploration of the effects of these 38 missense SNPs on the protein structure, it was ascertained that 18 of them led to an increase in protein stability. This increase was observed when the DDG value exhibited a positive value or was greater than zero. Conversely, the remaining 20 missense SNPs were found to decrease protein stability, corresponding to cases where the DDG displayed a negative value or less than zero.

Among the 20 missense SNPs associated with reduced protein stability, a distribution was observed across different proteins. Specifically, there were four SNPs located in CDKAL1, one in HHEX, two in HNF1B, one in IGF2BP2, five in PAX4, four in SLC30A8, and three in TCF7L2 proteins (Table 4).

3.4. Molecular Effects on Structural and Functional Properties of Proteins

The functional and structural alterations of the candidate genes were predicted using MutPred 2.0 tools (Table 5). To assess their functional and structural characteristics, 20 SNPs were subjected that identified as the most deleterious SNPs in the prior analysis. Upon a thorough assessment of these 20 mutations, 18 were classified as pathogenic mutations, while the remaining three were deemed benign. Within these 18 pathogenic mutations, four were located in CDKAL1, one in HHEX, two in HNF1B, one in IGF2BP2, five in PAX4, three in SLC30A8, and two in TCF7L2. These findings provide compelling evidence that identified pathogenic nsSNPs might indeed exert a significant influence on both the structural and functional characteristics of the proteins (Table 5).

Table 5. Identification of molecular effects on structural and functional properties.
Gene ID ID variants Mutations Score Remarks Molecular mechanisms (with p value)
CDKAL1 rs4710963 R397S 0.879 Pathogenic Altered ordered interface (p = 4.5e − 03), loss of catalytic site at R397 (p = 7.9e − 04), loss of relative solvent accessibility (p = 0.04), loss of allosteric site at R397 (p = 0.05), altered metal-binding (p = 0.01), altered transmembrane protein (p = 0.01), and gain of sulfating at Y395 (p = 0.04).
rs200195852 T70A 0.902 Pathogenic Loss of catalytic site at H75 (p = 3.6e − 03), altered ordered interface (p = 0.01), altered transmembrane protein (p = 1.4e − 03), gain of allosteric site at W71 (p = 0.03), altered metal-binding (p = 0.04), and gain of disulfide linkage at C73 (p = 0.03).
rs368733380 P197L 0.916 Pathogenic Altered disordered interface (p = 0.04), altered DNA binding (p = 0.02), and altered transmembrane protein (p = 0.04).
rs374932945 Y459C 0.899 Pathogenic Altered ordered interface (p = 4.3e − 03), altered disordered interface (p = 0.01), altered transmembrane protein (p = 2.5e − 04), and altered metal binding (p = 0.04).
  
HHEX rs17851141 A171T 0.699 Pathogenic Altered disordered interface (p = 0.02), altered coiled-coil (p = 0.01), and loss of acetylation at K172 (p = 0.03).
  
HNF1B rs374126219 G144S 0.69 Pathogenic Altered transmembrane protein (p = 3.3e − 03) and loss of N-linked glycosylation at N146 (p = 0.04).
rs193922490 W171R 0.923 Pathogenic Altered ordered interface (p = 2.5e − 03), altered disordered interface (p = 5.4e − 03), loss of helix (p = 0.02), and gain of phosphorylation at Y169 (p = 0.02).
IGF2BP2 rs113792141 S311P 0.867 Pathogenic Gain of loop (p = 0.05), altered metal binding (p = 0.03), and altered DNA binding (p = 0.03).
  
PAX4 rs115887120 R39Q 0.849 Pathogenic Loss of allosteric site at R39 (p = 0.2e − 04), altered metal binding (p = 0.03), gain of strand (p = 0.04), altered DNA binding (p = 0.01), and gain of catalytic site at R39 (p = 0.03).
rs147279315 R45Q 0.795 Pathogenic Gain of strand (p = 0.04), altered metal binding (p = 0.04), loss of allosteric site at P40 (p = 0.02), and altered DNA binding (p = 0.04).
rs369459316 R226C 0.685 Pathogenic Altered disordered interface (p = 1.9e − 03) and loss of pyrrolidone carboxylic acid (p = 0.04).
rs370095957 R60H 0.856 Pathogenic Altered ordered interface (p = 0.05), loss of helix (p = 0.04), gain of allosteric site at T64 (p = 0.03), and altered DNA binding (p = 0.03).
rs375391009 I43V 0.636 Pathogenic Gain of allosteric site at M38 (p = 4.4e − 03), loss of strand (p = 0.04), altered metal-binding (p = 0.03), altered DNA binding (p = 0.01), and gain of catalytic site at R39 (p = 0.03).
  
SLC30A8 rs73317647 R165C 0.701 Pathogenic Altered transmembrane protein (p = 0.0e + 00), altered ordered interface (p = 0.03), loss of allosteric site at R165 (p = 0.03), and altered metal binding (p = 0.04).
rs140404252 L74R 0.85 Pathogenic Gain of helix (p = 0.05) and gain of strand (p = 0.05).
rs369783320 D248N 0.788 Pathogenic Altered ordered interface (p = 0.04), altered transmembrane protein (p = 5.2e − 04), gain of relative solvent accessibility (p = 0.03), and altered metal binding (p = 0.02).
  
TCF7L2 rs148523217 P247T 0.549 Pathogenic Loss of intrinsic disorder (p = 0.04), gain of O-linked glycosylation at S246 (p = 0.04), and loss of sulfating at Y242 (p = 0.02).
rs188153157 D10G 0.599 Pathogenic Loss of loop (p = 8.0e − 03), gain of B-factor (p = 0.02), altered metal binding (p = 0.03) and loss of proteolytic cleavage at D10 (p = 0.02).

3.5. Prediction of Mutational Impacts of High-Risk nsSNPs on Protein Structure

The structural features of substituted amino acid residues on native proteins were predicted by the Project HOPE server. Three-dimensional model structures of the mutated proteins were generated and used to simulate how the substitutions of amino acid residues affect the structural characteristics of the native protein (Figure 2). Moreover, this analysis revealed distinctions in physicochemical properties between wild-type and mutant amino acids, as detailed in Table 6. All identified nsSNPs were found to induce structural alterations in amino acid size, with the exception of two mutations D248N in SLC30A8 and P247T in TCF7L2. Among the 18 mutations, 10 (R397S, T70A, Y459C, R39Q, R45Q, R226C, R60H, I43V, R165C, and D10G) led to a reduced size compared to the wild-type structure, whereas six (L74R, S311P, W171R, G144S, A171T, and P197L) resulted in a larger size. The structural changes caused by two mutations, D248N and P247T, remained unpredictable. In addition, after structural alterations, nine mutations (R397S, T70A, R39Q, R45Q, R226C, R60H, R165C, D248N, and D10G) shifted to a neutral charge, two (W171R and L74R) exhibited a positive charge, while the charges of the remaining seven mutations (P197L, Y459C, A171T, G144S, S311P, I43V, and P247T) remained unpredictable. Further investigation revealed that seven mutations (R397S, T70A, Y459C, S311P, R226C, R165C, and D10G) exhibited heightened hydrophobicity, whereas four mutations (W171R, A171T, L74R, and P247T) displayed reduced hydrophobic characteristics as compared to wild-type residues. The functional impacts of the remaining seven mutations (P197L, G144S, R39Q, R45Q, R60H, I43V, and D248N) remained undetermined (Table 6).

Details are in the caption following the image
Prediction of 3D model structure. ∗Here, the violet color on the ribbon diagram represents the site of mutation. Green and red colors indicate native and mutated amino acids, respectively.
Details are in the caption following the image
Prediction of 3D model structure. ∗Here, the violet color on the ribbon diagram represents the site of mutation. Green and red colors indicate native and mutated amino acids, respectively.
Table 6. Prediction of mutational impacts of high-risk nsSNPs on protein structure.
Gene Mutation Schematic structures of amino acid change Wild-type amino acids Mutant-type amino acid
Size Charge Hydrophobicity Size Charge Hydrophobicity
CDKAL1 R397S image Larger Positive Less hydrophobic Smaller Neutral More hydrophobic
T70A image Larger Positive Less hydrophobic Smaller Neutral More hydrophobic
P197L image Smaller Bigger
Y459C image Larger Less hydrophobic Smaller More hydrophobic
  
HHEX A171T image Smaller More hydrophobic Bigger Less hydrophobic
  
HNF1B G144S image Smaller Bigger
W171R image Smaller Neutral More hydrophobic Bigger Positive Less hydrophobic
  
IGF2BP2 S311P image Smaller Less hydrophobic Bigger More hydrophobic
  
PAX4 R39Q image Larger Positive Smaller Neutral
R45Q image Larger Positive Smaller Neutral
R226C image Larger Positive Less hydrophobic Smaller Neutral More hydrophobic
R60H image Larger Positive Smaller Neutral
I43V image Larger Smaller
  
SLC30A8 R165C image Bigger Positive Less hydrophobic Smaller Neutral More hydrophobic
L74R image Smaller Neutral More hydrophobic Bigger Positive Less hydrophobic
D248N image Negative Neutral
  
TCF7L2 P247T image More hydrophobic Less hydrophobic
D10G image Bigger Negative Less hydrophobic Smaller Neutral More hydrophobic

3.5.1. Prediction of Amyloidogenic Regions

The amyloidogenic regions of candidate proteins predicted from the FoldAmyloid server are presented in Supporting Figure 3. The protein aggregates analysis showed eight mutants from four genes, including T70A and Y459C mutations from the CDKAL1 gene; W171R from the HNF1B gene; R45Q, R226C, and R60H from the PAX4 gene; and R165C and L74R from the SLC30A8 gene (Supporting Table 1) were found in amyloidogenic regions. This suggests that these mutations notably interfere with helix formation, leading to the aggregation of beta-sheets.

4. Discussion

Genetic consequences play a critical role in T2D pathogenesis. Bioinformatics analysis of T2D-associated genes and SNPs is an integral part of understanding the underlying causes and specific treatments of diseases [31, 32]. More specifically, identifying disease-causing genes and SNPs, anticipating their functional effects, and considering population-specific variants can create more specialized, tailored, and effective prevention strategies and treatments for T2D [33]. This study identified seven (CDKAL1, HHEX, HNF1B, IGF2BP2, PAX4, SLC30A8, and TCF7L2) candidate genes from eight GWAS catalog databases that were associated with T2D pathogenesis in SAP (Table 1). The GWAS investigation among East Asians and Europeans validated that genes including SLC30A8, KCNQ1, CDC123, HNF1B, KCNJ11, TCF7L2, CDKAL1, CDKN2A/2B, PPARG, HHEX, IGF2BP2, GLIS3, JAZF1, WFS1, and MTNR1B showed associations with T2D and diabetes-related traits [34]. Subsequent GWAS in the Chinese population demonstrated that PPARG, KCNJ11, CDKAL1, CDKN2A-CDKN2B, IDE-KIF11-HHEX, IGF2BP2, and SLC30A8 genes were significantly associated with T2D pathogenesis [35]. Furthermore, another T2D-focused GWAS revealed that TCF7L2, MTNR1B, SLC30A8, CDKAL1, IGF2BP2, CDC123, KCNJ11, FTO, HHEX, HNF1B, THADA, JAZF1, CAMK1D, WFS1, and TSPAN8 loci contribute to susceptibility to the disease [36]. Interestingly, this current study’s findings align consistently with prior GWAS investigations, confirming the potential pathogenic role of the identified candidate genes in T2D within the SAP. GO is a biological exploration for annotating genes and gene products to identify biological characteristics based on biological processes, cellular components, and molecular functions [37]. The biological processes of GO analysis revealed that our identified genes are mostly linked with the localization of transcription factors and the biosynthesis of heparan sulfate proteoglycan. The molecular function of GO analysis revealed that our targeted genes are mostly linked with the ability to act as a DNA-binding transcription repressor and the role of RNA polymerase II. In terms of the cellular component of GO analysis, our targeted genes are mostly linked with the complex formed by catenin and beta-catenin-TCF7L2, as well as the complex involving beta-catenin–TCF (Figure 1). Interestingly, the previous research showed that the localization of transcription factors and RNA polymerase II transcription factor bindings are the fundamental processes to maintain the cell growth and transcription of genes of cellular biology [38]. Heparan sulfate proteoglycans are glycoproteins at the cell surface and in the extracellular matrix that act as receptors and coreceptors, with profound effects on growth factor action, cell adhesion, and tissue architecture [39]. Glucagon-like peptide-1 (GLP-1) is a potent stimulator of glucose-dependent insulin secretion. Administration of GLP-1 to patients with T2D normalizes both fasting and postprandial glycemia, not only through stimulation of insulin release but also through concomitant inhibition of glucagon secretion and gastric motility and, possibly, enhancement of insulin sensitivity [40]. The sequence-specific DNA-binding transcription factors interpret the genetic regulatory information, such as in transcriptional enhancers and promoters [41]. Chromatin structure plays a significant role in regulating transcription in eukaryotic organisms because it determines the accessibility of various regions of DNA to the interacting binding factors [42]. In addition, the absence of β-catenin reduces chromatin-binding capabilities [43]. Indeed, the results from biological processes, cellular components, and molecular functions of GO analysis are reliable with previous findings (Figure 1). Hence, the crucial functions of identified candidate genes in humans guide how genes play a role in T2D pathogenesis–modulating biological, cellular, and molecular functions. The cyclin-dependent kinase 5 regulatory subunit-associated protein 1-like 1 (CDKAL1) gene suppresses the action of the CDK5-p35–associated beta cell complex and its variants reducing the HOMA-β level that shows beta cell dysfunction [44]. The hematopoietically expressed homeobox (HHEX) gene is essential for embryonic development, and the mutations of HHEX are linked to diseases such as diabetes and pancreatic disorders [45]. Heterozygous mutations in the hepatocyte nuclear factor 1β (HNF1B) gene in humans lead to a multisystem disorder that includes pancreatic hypoplasia and diabetes mellitus [46]. The modified insulin-like growth factor 2 mRNA-binding protein 2 (IGF2BP2) gene contributes to the development and progression of several metabolic diseases and cancers, including diabetes, obesity, and fatty liver [47]. Mutations in the paired box 4 (PAX4) gene cause maturity-onset diabetes of the youth (MODY9), increase susceptibility to T2D, and promote T1D [48]. Variants in the solute carrier family 30-member 8 (SLC30A8) and transcription factor-7–like 2 (TCF7L2) genes are associated with an increased risk of T2D [49, 50].

Pathway enrichment analysis is a crucial phase in interpreting critical gene functions and biological processes using high-throughput data [51]. KEGG pathway enrichment analyses indicated that the prominent pathways associated with maturity-onset diabetes of the youth and T2D mellitus were found to be enriched among the genes identified in the T2D GWAS [36]. KEGG pathway enrichment analysis of this study revealed that the maturity-onset diabetes of the youth pathways is most closely related to our targeted genes (Table 2, Supporting Figure 1).

SNPs are widely recognized as one of the prevalent risk factors linked to numerous complex diseases. When SNPs appear within the protein-coding region, they have adverse impacts on the protein’s function and structure. Several methodologies have been employed to investigate the detrimental effects associated with nsSNPs in various diseases [11]. This study focused on identifying the most damaging nsSNP variants within the identified candidate genes and their contributions to T2D pathogenesis in SAP. A stepwise filtration process utilizing various disease-associated bioinformatics tools was employed to distinguish harmful nsSNP variants from those that are tolerated or have a neutral effect. By conducting a comprehensive search in the SNP database (dbSNP database), a total of 3760 nsSNPs were initially identified. Subsequently, six functional tools (SIFT, PolyPhen-2, PROVEAN, PhD-SNP, SNAP, and Meta-SNP) were utilized to assess whether these identified nsSNPs had a harmful or benign impact. This rigorous analysis led to the selection of 42 nsSNPs as the most damaging ones contributing to T2D pathogenesis in the SAP (Table 3). Furthermore, several structural analysis tools (including ConSurf, I-Mutant3.0, MUpro, MutPred, and Project HOPE) were employed to identify mutations that are linked to the disease. A total of 18 distinct mutations (R397S, T70A, P197L, Y459C, A171T, G144S, W171R, S311P, R39Q, R45Q, R226C, R60H, I43V, R165C, L74R, D248N, P247T, and D10G) were identified as likely to be harmful, deleterious, or disease-causing based on the comprehensive results of bioinformatics analyses.

Highly conserved regions are essential for maintaining the proper functions and stable structure of any macromolecule. Furthermore, the functional regions of proteins are to be conserved, and these regions are linked to various functions, including catalytic activity, interactions, and binding [52]. The nsSNPs that have been proven to exist within conserved regions are considered the most deleterious nsSNPs [18]. The evolutionary conservation analysis of this current study revealed that 38 mutations were presented in conserved regions. Among them, 23 mutations were highly conserved and exposed to the protein that has a functional role (Supporting Figure 2). These mutations are C221R and G72C from CDKAL1 protein; G144S, S148W, R235W, and G408R from HNF1B protein; S311P from IGF2BP2 protein; R45W, R39Q, R172W, R45Q, R20W, R226C, R60H, G65D, R227W, and R63C from PAX4 protein; R165C and R165H from SLC30A8 protein; P202H, P247T, E53K, and D10G from TCF7L2 protein. Hence, based on the previous findings, it is indicated that our identified mutations are functionally active for the pathogenesis of T2D in SAP.

Previous studies have shown that decreased protein stability leads to misfolding and degradation [12, 53]. Furthermore, several investigations have demonstrated that decreased protein stability causes an increase in protein breakdown, aggregation, and misfolding [54]. In addition, amyloidogenic regions are associated with various diseases, including diabetes, neurodegenerative disorders, and prion diseases [55]. The protein stability analysis in the current study showed that the substitutions R397S, T70A, P197L, Y459C, A171T, G144S, W171R, S311P, R39Q, R45Q, R226C, R60H, I43V, R165C, L74R, R165H, D248N, P202H, P247T, and D10G are destabilizing and can affect the protein’s structure and function (Table 4). Moreover, among these mutant variants, eight mutants (T70A, Y459C, W171R, R45Q, R226C, R60H, R165C, and L74R) were also observed in the aggregation profile, indicating that these mutations disrupt the formation of alpha-helices, leading to the aggregation of beta-sheets (Supporting Figure 3 and Supporting Table 1). The aggregation of proteins is linked to numerous diseases, including diabetes.

The previous research has suggested that mutations can result in changes to the size, charge, and hydrophobic properties of amino acid residues. These changes can potentially disrupt the structure and interactions of the protein [56]. In this study, Project HOPE server results have provided important information about the possible mutational effects of selected missense SNPs of the candidate genes on the protein structure. The polymorphisms (rs4710963, rs200195852, rs368733380, rs374932945, rs17851141, rs374126219, rs193922490, rs113792141, rs115887120, rs147279315, rs369459316, rs370095957, rs375391009, rs73317647, rs140404252, rs369783320, rs148523217, and rs188153157) result in R397S, T70A, P197L, Y459C, A171T, G144S, W171R, S311P, R39Q, R45Q, R226C, R60H, I43V, R165C, L74R, D248N, P247T, and D10G amino acid substitutions, respectively (Figure 2 and Table 6). More precisely, differences in size and hydrophobicity between the wild-type and mutant residues can affect their interactions with membrane lipids, particularly through hydrophobic interactions [55]. The substituted amino acids have different physiochemical properties that may interrupt the targeted protein structures. Due to the polymorphisms, the seven mutated residues (R397S, T70A, Y459C, S311P, R226C, R165C, and D10G) were more hydrophobic than wild-type residues, which might cause the loss of hydrogen bonds with other molecules and may disrupt correct protein folding. In contrast, the wild-type amino acid residues were more hydrophobic than in W171R, A171T, L74R, and P247T mutation, resulting in a loss of hydrophobic interactions with other molecules on the surface of the protein (Figure 2, Table 6). Furthermore, this investigation showed that L74R, S311P, W171R, G144S, A171T, and P197L residues were bigger and R397S, T70A, Y459C, R39Q, R45Q, R226C, R60H, I43V, R165C, and D10G residues were smaller as compared to the wild-type residues (Figure 2, Table 6). Another study suggested that when the charge of the wild-type protein is lost, it can result in the disappearance of interactions with other residues [24]. This study observed that mutations had different effects on the charge of specific amino acids. For instance, the charge of R397S, T70A, R39Q, R45Q, R226C, R60H, R165C, D248N, and D10G mutations shifted to a neutral charge compared to the wild-type residues. Conversely, the W171R and L74R mutations changed to a positive charge when compared to the wild type (Figure 2 and Table 6). Hence, the identified mutations play a crucial role in disrupting the structure and interactions of the protein.

5. Conclusion

SNPs represent the most prevalent genetic variations known to be associated with multiple human diseases. Identification and understanding of the impact of SNPs on diverse diseases hold the promise of shedding light on susceptibility to these diseases and contributing to the advancement of more efficient therapeutic approaches. This study investigated the structural and functional effects of 3760 nsSNPs from the seven (CDKAL1, HHEX, HNF1B, IGF2BP2, PAX4, SLC30A8, and TCF7L2) genes associated with T2D pathogenesis in SAP through comprehensive in silico bioinformatics approaches. The identification and exploration of nsSNPs implicated in the initiation of various human diseases through experimental means pose significant challenges to biologists. Therefore, we turned to sophisticated computational algorithms and tools to differentiate potentially harmful nsSNPs that could disrupt the protein’s structure–function relationship. The computational strategy adopted in this study identified 42 deleterious and disease-associated SNPs coupled with their impact on T2D pathogenesis in SAP. In the protein evolutionary conservation analysis, 38 SNPs were selected as mostly damaging for disease-causing. The mutational impacts on structural and functional properties of protein showed that 18 mutations were found to be pathogenic. Among these, seven mutations exhibited increased hydrophobicity, while four showed decreased hydrophobicity in the context of the molecular effects associated with high-risk nsSNPs on protein structure. Notably, the mutants T70A, Y459C, W171R, R45Q, R226C, R60H, R165C, and L74R were found to be involved in beta-sheet aggregation, a phenomenon closely associated with diabetes. Hence, the insights obtained from this comprehensive computational study might support experimental studies and reveal the significance of identified mutations in drug discovery and development of precision medications.

Conflicts of Interest

The authors declare no conflicts of interest.

Author Contributions

Md. Hafizur Rahman and Mrityunjoy Biswas designed this study. Md. Numan Islam and Md. Golam Rabby performed comprehensive analyses. Md. Hafizur Rahman, Md. Numan Islam, and Salina Shaheen Parul wrote this manuscript. Mrityunjoy Biswas and Md. Mahmudul Hasan revised the manuscript. All authors read and approved the final manuscript.

Funding

This research work has been conducted with the financial support of the ICT division, Ministry of Posts, Telecommunications, and Information Technology, ICT Tower, Agargaon, Sher-e-Bangla Nagar, Dhaka-1207, Dhaka, Bangladesh.

Supporting Information

Supporting Table 1: prediction of amyloidogenic regions.

Supporting Figure 1: KEGG pathway enrichment analysis.

Supporting Figure 2: identification of protein evolutionary conservation.

Supporting Figure 3: prediction of amyloidogenic regions.

Data Availability Statement

The data that support the results of this study are available in databases described in the manuscript and from the corresponding authors upon request.

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.