Volume 15, Issue 4 pp. 410-422
Review
Open Access

Perspectives on genetic studies of type 2 diabetes from the genome-wide association studies era to precision medicine

Minako Imamura

Minako Imamura

Department of Advanced Genomic and Laboratory Medicine, Graduate School of Medicine, University of the Ryukyus, Nishihara-Cho, Japan

Division of Clinical Laboratory and Blood Transfusion, University of the Ryukyus Hospital, Nishihara-Cho, Japan

Search for more papers by this author
Shiro Maeda

Corresponding Author

Shiro Maeda

Department of Advanced Genomic and Laboratory Medicine, Graduate School of Medicine, University of the Ryukyus, Nishihara-Cho, Japan

Division of Clinical Laboratory and Blood Transfusion, University of the Ryukyus Hospital, Nishihara-Cho, Japan

Correspondence

Shiro Maeda

Tel.: +81-988951204

Fax: +81-988951436

E-mail address: [email protected]

Search for more papers by this author
First published: 23 January 2024
Citations: 11

Abstract

Genome-wide association studies (GWAS) have facilitated a substantial and rapid increase in the number of confirmed genetic susceptibility variants for complex diseases. Approximately 700 variants predisposing individuals to the risk for type 2 diabetes have been identified through GWAS until 2023. From 2018 to 2022, hundreds of type 2 diabetes susceptibility loci with smaller effect sizes were identified through large-scale GWAS with sample sizes of 200,000 to >1 million. The clinical translation of genetic information for type 2 diabetes includes the development of novel therapeutics and risk predictions. Although drug discovery based on loci identified in GWAS remains challenging owing to the difficulty of functional annotation, global efforts have been made to identify novel biological mechanisms and therapeutic targets by applying multi-omics approaches or searching for disease-associated coding variants in isolated founder populations. Polygenic risk scores (PRSs), comprising up to millions of associated variants, can identify individuals with higher disease risk than those in the general population. In populations of European descent, PRSs constructed from base GWAS data with a sample size of approximately 450,000 have predicted the onset of diseases well. However, European GWAS-derived PRSs have limited predictive performance in non-European populations. The predictive accuracy of a PRS largely depends on the sample size of the base GWAS data. The results of GWAS meta-analyses for multi-ethnic groups as base GWAS data and cross-population polygenic prediction methodology have been applied to establish a universal PRS applicable to small isolated ethnic populations.

INTRODUCTION

Approximately 537 million people worldwide are affected by diabetes mellitus, and this number is predicted to increase to 643 million by 2030 and to 783 million by 20451. A total of 90% of the global diabetes cases are type 2 diabetes, which is characterized by insulin resistance in peripheral tissues and dysregulated insulin secretion from pancreatic β-cells. Although the current increase in the prevalence of type 2 diabetes is driven mainly by lifestyle changes, complex genetic determinants are considered to contribute to the inherent susceptibility to this disease2-5. Linkage analyses and candidate gene approaches were the primary methods used to link genotypes to specific phenotypes until the beginning of the 21st century. However, these approaches are not suitable for identifying variants with smaller effects on disease susceptibility. The first round of genome-wide association studies (GWAS) for type 2 diabetes was carried out by European and USA groups in 2007, and they reported robust susceptibility loci for type 2 diabetes in Europeans6-10. Subsequently, by combining individual GWAS data to increase the statistical power, the discovery of type 2 diabetes susceptibility loci has been vigorously advanced, and global efforts have been made to apply the achievements of these genetic studies in clinical practice through the development of novel therapeutics or the establishment of precision medicine, although these remain challenging areas.

This review summarizes recent advances in the genetics of type 2 diabetes, and discusses the perspective of future investigations for understanding the genetic architecture of type 2 diabetes and its clinical applications.

GENOME-WIDE ASSOCIATION STUDIES

The GWAS is a powerful biologically agnostic method for detecting genetic variations that predispose patients to diseases. In these studies, the entire genomes of individuals with and without the disorder of interest (i.e., cases and controls) are screened for a large number of single-nucleotide variants/polymorphisms (SNVs/SNPs). Hundreds of thousands to millions of SNPs for a single individual can be genotyped using high-throughput SNP genotyping technology. The number of genotyped SNPs can be efficiently and cost-effectively increased by genotype imputation, a computational method for estimating an individual's SNP genotypes not directly genotyped. Using directly determined genotypes and publicly available or in-house population-matched linkage disequilibrium data, >10 million imputed SNP genotype data can be obtained for each individual.

The finding that a particular SNP allele (or genotype) is present at higher frequency in the cases than in the controls suggests that the SNP is associated with the disease, and the statistically significant level is set at a P-value of 5 × 10−8 in GWAS11, 12, based on the Bonferroni correction for multiple testing error, assuming more than hundreds of thousands of independent association tests are carried out in a GWAS. This stringent threshold reflects the standard P-value threshold of 0.05 divided by 1 million, and it reduces the number of false positive SNP associations identified. Even with such strict statistical thresholds, it is preferable that the findings are replicated in an independent dataset to verify the association of a SNP locus with the phenotype of interest11, 12.

Based on the ‘common disease common variant hypothesis’, which argues that the major contributors to genetic susceptibility to common diseases are genetic variations with appreciable frequency in the population, but relatively small-to-modest effects on the disease susceptibility, a large number of variants with small effects (odds ratio [OR] ~1.5) have additive or multiplicative effects on disease susceptibility. Association data from several case–control collections can be combined by meta-analysis, enabling the identification of SNPs with smaller effect sizes by increasing the overall sample sizes13, 14.

TYPE 2 DIABETES GWAS IN EUROPEAN POPULATIONS

In 2006, five SNPs and a tetranucleotide repeat polymorphism (DG10S478) within TCF7L2 were reported to show a strong association with type 2 diabetes in three independent cohorts – Icelandic, Danish and USA15 – through a classical linkage and positional cloning approach. The association of TCF7L2 with type 2 diabetes was well replicated not only in Europeans, but also in other ethnic groups16-21, including the Japanese22, 23. Interestingly, the risk allele frequencies within the TCF7L2 were considerably lower in the East Asian populations than in Europeans (~40% in Europeans, ~5% in the East Asians); as a result, the overall contribution of TCF7L2 variants to the prevalence of type 2 diabetes is likely smaller in East Asians.

In 2007, the first GWAS for type 2 diabetes was carried out in a French population comprising 661 cases and 614 controls, covering 392,935 SNP loci. This study identified novel association signals at SLC30A8, HHEX, LOC387761 and EXT2, and validated the association at TCF7L2 previously identified through linkage analysis6. Shortly after the French GWAS, an Icelandic study group identified CDKAL17, and three European collaborating groups identified CDKAL1, IGF2BP2 and CDKN2A/B8-10. These novel loci, except for LOC387761 and EXT2, and two previously reported variants (PPARG P12A and KCNJ11 E23K), were confirmed by multiple replication studies in European and non-European populations. Thus, the first round of European GWAS confirmed eight type 2 diabetes susceptibility loci – TCF7L2, SLC30A8, HHEX, CDKAL1, IGF2BP2, CDKN2A/B, PPARG and KCNJ11 – across different ethnic groups6-10. In addition to these eight loci, an association between variants in FTO and type 2 diabetes was identified, although the effect of FTO variants on susceptibility to type 2 diabetes was mostly mediated by an increase in bodyweight24.

During the first round of type 2 diabetes GWAS, each GWAS was carried out with a sample size of thousands for discovery analysis, and novel susceptibility loci with modest effect sizes (OR 1.2–1.4)6-10, 24-26 were identified. After the first round of GWAS, meta-analyses combining individual GWAS data were carried out to efficiently increase the sample size to identify variants with weaker effects27-33. The European type 2 diabetes loci identified in the first round of the GWASs and the subsequent GWAS meta-analyses by 2017 are summarized in Table 1. In 2018, Mahajan et al. reported the results of a GWAS meta-analysis of 898,130 individuals of European descent (9% type 2 diabetes cases)34, which was the largest European type 2 diabetes GWAS as of November 2023. Using these data, 245 loci, including 135 that were newly implicated in the predisposition to type 2 diabetes (P < 5 × 10−8), were identified.

Table 1. Summary of type 2 diabetes susceptibility loci identified by European type 2 diabetes GWAS (P < 5 × 10−8)
Year Type 2 diabetes loci identified by GWAS (P < 5 × 10−8) Reference
2007–2008 TCF7L2, SLC30A8, HHEX, CDKN2A/B, IGF2BP2, CDKAL1, HNF1B, WFS1, PPARG, KCNJ11, FTO 6-10, 24-26
2008 JAZF1, CDC123-CAMK1D, TSPAN8-LGR5, THADA, ADAMTS9, NOTCH2 27
2009 IRS1, MTNR1B 28, 29
2010 BCL11A, ZBED3, KLF14, TP53INP1, CHCHD9, KCNQ1(intron11), CENTD2, HMGA2, HNF1A, ZFAND6, PRC1, DUSP9, ADCY5, DGKB/TMEM195, PROX1, GCKR, GCK 30, 31
2012 ZMIZ1, KLHDC5, TLE1, ANKRD55, CLIP2, MC4R, BCAR1 32
2017 ACSL1, HLA-DQA1, SLC35D3, MNX1, ABO, PLEKHA1, HSD17B12, MAP3K11, NRXN3, CMIP, ZZEF1, GLP2R, GIP 33
2018 The largest European type 2 diabetes GWAS has identified additional novel 135 loci 34
  • GWAS, genome-wide association studies.

JAPANESE AND EAST ASIAN TYPE 2 DIABETES GWAS

Cumulative evidence suggests that East Asians are more prone to type 2 diabetes than Europeans with the same body mass index or waist circumference, which raises the possibility that Asians are more genetically susceptible to insulin resistance or diabetes than Europeans35, 36. As the majority of susceptibility loci for type 2 diabetes have been identified in European GWAS, non-European-population-specific loci that have not been captured in European studies might exist. Therefore, it is necessary to carry out GWAS in non-European populations, such as Asian populations, including the Japanese population.

In 2008, two independent Japanese GWAS simultaneously identified the KCNQ1 locus as a type 2 diabetes susceptibility locus in Japanese individuals37, 38. This was the first non-European GWAS to establish a type 2 diabetes susceptibility locus. Subsequent replication studies carried out in different ethnic groups showed that SNPs located in intron 15 of KCNQ1 had the strongest effect on susceptibility to type 2 diabetes in several East Asian populations39-42. The association of the KCNQ1 locus with type 2 diabetes was replicated in European populations, but the minor allele frequencies were considerably lower in Europeans than in East Asians (~7% in Europeans vs ~40% in East Asians). Thus, in contrast to TCF7L2, the contribution of KCNQ1 variants to the prevalence of type 2 diabetes is relatively small in European populations. As the KCNQ1 locus was not captured in the first round of European studies, this finding emphasizes the importance of carrying out GWAS in different ethnic groups. Although the two Japanese GWAS successfully identified the KCNQ1 locus, these studies had limited sample sizes at the initial stage of the genome-wide scan: 187–194 type 2 diabetes cases versus 752–1,558 controls37, 38.

In 2010, a Japanese GWAS with a larger sample size (4,470 type 2 diabetes cases and 3,071 controls) discovered two additional type 2 diabetes susceptibility loci: UBE2E2 and C2CD4A-C2CD4B43. Associations between these loci and type 2 diabetes were confirmed in an East Asian replication study43 and large-scale European GWAS afterwards32, suggesting that GWAS for type 2 diabetes using non-European populations are useful for identifying both ethnicity-specific loci and common susceptibility loci among different ethnic groups.

An effort to increase the sample size in East Asian populations by the Asian Genetic Epidemiology Network consortium44 identified eight additional novel loci, GLIS3, PEPD, FITM-R3HDML-HNF4A, KCNK16, MAEA, GCC1-PAX4, PSMD6 and ZFAND3, through a genome-wide scan comprising a substantial sample size (6,952 cases and 11,865 controls), followed by replication testing (stage 2 in silico replication analysis: 5,843 cases and 4,574 controls; de novo replication analysis: 12,284 cases and 13,172 controls)44. A Japanese GWAS further identified the ANK1 locus in 2012 using the same discovery set (4,470 cases and 3,071 controls) as a previous Japanese GWAS43, but with an increase in the number of variants (~2 million) by genotype imputation45, and MIR129-LEP, GPSM1 and SLC16A11-SLC16A13 in 2014 with an increase in the sample size (5,976 cases and 20,829 controls) and number of variants (~ 6.2 million)46. A Japanese GWAS meta-analysis comprising a larger sample size of genome-wide scans (15,463 cases and 26,183 controls) followed by replication testing (7,936 cases and 5,539 controls) identified seven additional type 2 diabetes susceptibility loci (CCDC85A, FAM60A, DMRTA1, ASB3, ATP8B2, MIR4686 and INAFM2)47. In 2019, a GWAS meta-analysis consisting of 36,614 cases and 155,150 controls of Japanese ancestry was carried out, which is the largest type 2 diabetes GWAS in the Japanese population as of 202348. Using these data, 88 loci were identified as type 2 diabetes-associated (P < 5 × 10−8), and 28 of these were novel. In 2020, a GWAS meta-analysis of type 2 diabetes in the East Asian population, including Japanese individuals, was carried out using 77,418 individuals with type 2 diabetes and 356,122 controls49. The largest meta-analysis of type 2 diabetes from East Asian individuals identified 183 type 2 diabetes loci (P < 5 × 10−8), including 61 novel loci. Type 2 diabetes loci identified in the Japanese and East Asian individuals with type 2 diabetes are summarized in Table 2.

Table 2. Summary of type 2 diabetes susceptibility loci identified by Japanese, East Asian type 2 diabetes genome-wide association studies (P < 5 × 10−8)
Ancestry Sample size Novel loci identified (P < 5 × 10−8) Study
Discovery Replication Year Authors Reference
Type 2 diabetes Control Type 2 diabetes Control
Japanese 194 1558 4,924 2,618 KCNQ1 2008 Unoki et al. 37
187 752 1,424 1,424 KCNQ1 2008 Yasuda et al. 38
4,470 3,071 6,508 5,443 UBE2E2, C2CD4A-C2CD4B 2010 Yamauchi et al. 43
4,470 3,071 7,605 3,534 ANK1 2012 Imamura et al. 45
5,976 20,829 24,416 13,985 MIR129-LEP, GPSM1, SLC16A13 2014 Hara et al. 46
15,463 26,183 7,936 5,539 CCDC85A, FAM60A, DMRTA1, ASB3, ATP8B2, MIR4686, INAFM2 2016 Imamura et al. 47
36,614 155,150 N/A N/A USP48, FANCL-BCL11A, SCTR, EPC2, DGKD, CASR-CD86, GNPDA2-GABRG1, GRSF1-MOB1B, AGPAT9-NKX6-1, PARP8-EMB, ITGA1, FGFR4-ZNF346, NUS1, ETV1-ARL4A, AUTS2, C10orf11-ZNF503, WDR11-FGFR2, SLC1A2, ETS1, KSR2, DLEU7-KCNRG, GP2, ZNF257, NFATC2, CNKSR2, TIMM17B-PCSK1N, SPIN2A-FAAH2, IL13RA1 2019 Suzuki et al. 48
East Asian 6,952 11,865 18,127 17,746 GLIS3, PEPD, FITM2-R3HDML-HNF4A, KCNK16, MAEA, GCC1-PAX4, PSMD6, ZFAND3 2012 Cho et al. 44
77,418 356,122 N/A N/A VWA5B1, MAST2, PGM1, TSEN15, MDM4, SIX3, IKZF2, ZBTB20, TFRC, RANBP3L, PCSK1, REPS1, HIVEP2, ZNF713, STEAP1, CALCR, GRM8-PAX4, ASAH1, ZNF703, FGFR1, KCNB2, GDAP1, TRIB1, EFR3A, DMRT2, PTCH1, ABCA1, PTF1A, ARID5B, JMJD1C, ARHGAP19, BBIP1, BDNF, FAIM2, ALDH2, RBM19, FGF9, NYNRIN, LRRC74A, DLK1/MEG3/miRNA cluster, TRAF3, HERC2, MYO5C, RGMA, IGF1R, PKD1L3, ZFHX3, SUMO2, ZNF799, ZNRF3, WNT7B, MYOM3/SRSF10, TSN, GRB10, NID2 2020 Spracklen et al. 49
  • Sample size of repliation study to reach genome -wide significant association with type 2 diabetes using Japanese or East Asian populations.

Contrary to expectations at the beginning of the GWAS era, a transethnic comparison of large-scale GWAS data showed a substantial shared type 2 diabetes susceptibility between European and East Asian populations48, 49. Suzuki et al. showed that the majority (77%) of the lead variants of Japanese type 2 diabetes loci were common, minor allele frequency (MAF) >0.05, in both Japanese and European populations, and their effect sizes were strongly correlated (Pearson's r = 0.83, P = 8.7 × 10−51) and directionally consistent (94%) between these two populations, indicating that the majority of genetic susceptibility loci are shared between Japanese and European ancestry34, 48. A similar observation was reported in a large-scale East Asian type 2 diabetes GWAS49, in which the effect sizes of variants significantly associated with type 2 diabetes in individuals with East Asian and European ancestries were strongly correlated (r = 0.87). However, 8.4% of type 2 diabetes susceptibility variants identified in the East Asian GWAS had significant heterogeneity of effect between the East Asian and European populations, with most of them being common (MAF ≥1%) or low-frequency (1% > MAF ≥ 0.1%) alleles in East Asians, but rare in Europeans (MAF < 0.1%). For example, rs3765467 in GLP-1R (p.Arg131Gln), identified as a type 2 diabetes locus in a Japanese GWAS48, the minor allele Gln, a protective allele for type 2 diabetes, is common in Japanese (MAF = 0.18) and East Asian (MAF = 0.23) populations, but rare in Europeans (MAF = 0.001). Given that GLP-1R encodes a receptor for glucagon-like peptide-1, a target of commonly used therapeutic drugs for type 2 diabetes, p.Arg131Gln is not only a marker of type 2 diabetes risk, but might also be a marker for clinical response to GLP-1R agonists in Japanese and East Asian patients48.

Transethnic comparisons of the molecular pathways identified from the results of population-specific GWAS have highlighted the shared and heterogeneous effects of a series of pathways on type 2 diabetes. The analyses identified shared pathways, such as mature-onset diabetes in young, β-cells, development, prostate cancer and G1 phase; however, several pathways, such as NOTCH signaling and insulin secretion, showed stronger associations in Japanese populations than in European populations48.

MULTI-ANCESTRY TYPE 2 DIABETES GWAS META-ANALYSIS

In 2014, motivated by the consistency of common variant associations observed across different populations50, 51, a multi-ancestry GWAS meta-analysis of >110,000 individuals, which combined GWAS data from multiple ethnic groups, including European, East Asian, South Asian and Mexican/Mexican American populations, identified seven new loci for type 2 diabetes susceptibility52.

After 2020, two large-scale multi-ethnic type 2 diabetes GWAS were reported, in which the sample size was further expanded to >1 million. In 2020, multi-ancestry meta-analysis of 228,499 cases and 1,178,783 controls, including those of European (79.1%), African American (4.0%), Hispanic (1.5%), South Asian, and East Asian (15.4%) ancestries was published13. Using these data, 568 associations (P < 5 × 10−8), including 318 previously unreported loci, were identified. In 2022, another multi-ancestry GWAS meta-analysis of type 2 diabetes involving 180,834 cases and 1,159,055 controls (51.1% European, 28.4% East Asian, 8.3% South Asian, 6.7% African and 5.6% Hispanic), 277 loci (P < 5 × 10−8) were identified, including 11 novel loci14. The multi-ethnic GWAS meta-analyses by 2022 are summarized in Table 3.

Table 3. Summary of multi-ancestry type 2 diabetes genome-wide association studies (P < 5 × 10−8)
Year Sample size Ancestry in the discovery stage No. loci identified Authors Reference
Discovery Replication
Total Case Control Case Control Group % Total Novel
2014 110,452 26,488 83,964 21,491 55,647 European 62.5 29 7 Mahajan et al.

52

South Asian 18.1
East Asian 17
Mexican and Mexican American 2.4
2020 1,407,282 228,499 1,178,783 N/A N/A European 79.1 568 318 Vujkovic et al.

13

South Asian and East Asian 15.4
African American 4
Hispanic 1.5
2022 1,339,889 180,834 1,159,055 N/A N/A European 51.1 277 11 Mahajan et al.

14

East Asian 28.4
South Asian 8.3
African American 6.6
Hispanic 5.6
  • Replication study was carried out in European populations.

Taking these results together, >700 genetic loci have been identified as predisposing to type 2 diabetes through GWASs as of November 2023. In addition, a multi-ancestry type 2 diabetes GWAS in 2.5 million individuals including 428,452 type 2 diabetes cases, comprising European, East Asian, African American, Hispanic and South Asian populations, has been carried out53.

Recently identified novel type 2 diabetes risk loci have smaller effects (OR <1.05 per allele) on disease risk than previously discovered type 2 diabetes risk loci (OR 1.1–1.4 per allele), showing that increased sample size enhances the statistical power to detect association signals with smaller effects. Genome-wide chip heritability analysis of the largest meta-analysis13 explained 19% of the type 2 diabetes risk on a liability scale. Given that the heritability of type 2 diabetes has been estimated to be 30–70%5, approximately half of type 2 diabetes heritability is still unknown.

TRANSLATION OF TYPE 2 DIABETES GENETICS INTO CLINICAL PRACTICE: THE POSSIBILITY OF IDENTIFYING NOVEL BIOLOGICAL MECHANISMS AND THERAPEUTIC TARGETS

Because the GWAS is a biologically agnostic method for detecting genetic variations that predispose individuals to a disease, the results might contribute to the identification of novel biological mechanisms and the subsequent discovery of novel therapeutic targets for type 2 diabetes (Figure 1). Hundreds of type 2 diabetes susceptibility loci have been discovered through GWAS, but translating the results into clinical practice, particularly for the development of new drugs, has been lagging behind. One major obstacle is that it is unexpectedly difficult to specify causal variants and genes within individual disease-susceptibility loci. Disease-associated variants are usually annotated by genes in close proximity; however, the proteins encoded by these genes might not necessarily play a causative role in the development of type 2 diabetes in humans. In fact, for most of the identified type 2 diabetes susceptibility loci, the causal variants and molecular mechanisms of diabetes risk remain unknown. Furthermore, most risk variants are found in the intronic or non-coding regions of genes, and are more likely to affect the regulation of transcription rather than gene function. Thus, obtaining novel biological insights that might uncover disease pathogenesis and guide drug discovery from GWAS-derived genetic information is challenging.

Details are in the caption following the image
Clinical implications of type 2 diabetes genome-wide association studies.

Rare variants are often localized in protein sequences, and might be directly implicated in disease pathogenesis. Therefore, studying rare coding variants has the potential to identify novel therapeutic targets. The association of rare coding variants with diseases is barely captured by GWAS in large, non-isolated populations, whereas isolated founder populations are a powerful resource for the discovery of causal variants that might be common in the isolate, but rare and difficult to capture in the other populations54, 55. A large-scale array-based genotyping and subsequent targeted exome sequencing in a small, historically isolated Greenlandic population has led to the identification of a strong association between p.Arg684Ter in TBC1D4 and type 2 diabetes risk56. This variant is common (allele frequency 17%) in the Greenlandic population, but vanishingly rare in other global populations56. TBC1D4 encodes AS160, a protein kinase B substrate of 160 kDa. Homozygous carriers of the variant have markedly higher concentrations of plasma glucose (β = 3.8 mmol/L) and serum insulin (β = 165 pmol/L) 2 h after an oral glucose load, and a10-fold increased risk of type 2 diabetes56. In Mexican and Mexican American populations, a risk haplotype carries four amino acid substitutions in SLC16A11, which are common (30%) in these populations, but rare (<2%) in Europeans, and these have been found to be associated with a modest increase in the prevalence of type 2 diabetes57. Furthermore, a whole-exome-sequencing-based study identified a rare missense variant that has a high impact on type 2 diabetes (OR ~5)58. This variant is located in HNF1A, a gene in which damaging variants are known to cause maturity-onset diabetes of the young subtype 359-62. This variant was also observed at an allele frequency of 2.1% in type 2 diabetes cases and 0.36% in control individuals in a Mexican population58. In contrast, only two of the 32,990 non-Finnish Europeans sequenced as part of the Exome Aggregation Consortium were found63. Thus, genetic studies using isolated populations might be useful for obtaining important information that is either highly significant for the studied population or provides a better understanding of the pathophysiology of diabetes, and might eventually lead to new drug targets.

To identify biological candidates for causal genes at established type 2 diabetes risk loci, we applied systematic candidate gene prioritization using an in silico pipeline originally developed by Okada et al.64, using various publicly available bioinformatics methods. The pipeline comprised: (1) functional annotation, (2) cis-acting expression data for quantitative trait loci, (3) a protein–protein interaction (PPI) network, (4) genetic overlap with monogenic diabetes, (5) knockout mouse phenotypes, and (6) PubMed text mining (Figure 2)47. Using this pipeline integrating disease-associated variants with diverse genomic and biological datasets, we narrowed down the potential causal genes from 286 genes located within 90 type 2 diabetes susceptibility loci to 40 biologically plausible type 2 diabetes genes. Subsequent drug-target searches identified seven genes (PPARG, KCNJ11, ABCC8, GCK, KIF11, GSK3B and JUN) as potential drug targets for type 2 diabetes treatment47. Notably, PPARG, KCNJ11 and ABCC8 are well-known targets for approved type 2 diabetes treatment options, and GCK activators have been approved for the type 2 diabetes treatment in China65, suggesting that the in silico pipeline is capable of detecting novel drug targets for the treatment of type 2 diabetes. Inhibitors of KIF11, GSK3B and JUN have been primarily developed for the treatment of cancers (KIF11, GSK3B) or rheumatoid arthritis (JUN), and these compounds might also be potential treatments for type 2 diabetes47.

Details are in the caption following the image
Discovery of potential drug targets for the treatment of type 2 diabetes (T2D). Upper half shows the strategy of drug targets search based on the genome-wide association studies-derived genetic information. Biological type 2 diabetes risk genes were selected from among the type 2 diabetes potential risk genes located in any of the established type 2 diabetes risk loci, using a scoring system by summing up the number of prioritization criteria: (1)–(6) satisfied. Novel type 2 diabetes therapeutic targets were discovered by searching overlapping genes between the biological type 2 diabetes risk genes plus genes which products are in direct protein–protein interaction (PPI) with the biological type 2 diabetes risk gene products and the drug target genes. The lower half shows representative connections between type 2 diabetes risk single-nucleotide polymorphisms (SNPs; blue), type 2 diabetes biological genes (gray), drug target genes (yellow) and targeted drugs. *Approved compounds for type 2 diabetes treatment. **Compounds for treatment against diseases other than type 2 diabetes under clinical trial. Modified citation from reference Imamura M et al, 201647.

TRANSLATION OF TYPE 2 DIABETES GENETICS INTO CLINICAL PRACTICE: THE POSSIBILITY OF DISEASE PREDICTION AND PREVENTION

Another major anticipated clinical use of genetic information is to predict the risk of developing future diseases (Figure 1). Indeed, previous investigations have suggested that lifestyle interventions attenuated genetic risk defined by carrying variants associated with type 2 diabetes66, 67, which are good examples of the clinical usefulness of pre-symptomatic genetic testing to allow the detection of high-risk individuals and subsequent precision healthcare, such as lifestyle interventions or health checkups. Type 2 diabetes, as well as other common diseases, is a polygenic/complex disorder comprising numerous common variants with modest effect sizes; therefore, multiple type 2 diabetes variants need to be taken into account to predict the genetic risk for individuals.

The genetic risk score (GRS) is calculated by summing the effects of type 2 diabetes risk alleles to generate an aggregate estimate of genetic risk. In the initial stages, GRSs were constructed using established type 2 diabetes susceptibility variants with genome-wide significance (P < 5 × 10−8) mainly identified through European GWAS, occasionally weighted to reflect their respective effect sizes. In 2013, we examined the utility of the GRS based on 49 established type diabetes loci (GRS-49) in a Japanese population, most of which were originally identified by European GWAS68. GRS-49, distributed between 34 and 69 in this population, was significantly associated with type 2 diabetes risk in the Japanese population, and those with the GRS ≥60 (5.7% of the population examined) were 9.81-fold as likely to have type 2 diabetes than those with a GRS <46 (4.2% of the population examined)68. These results suggest that, even though the impact of each type 2 diabetes susceptibility locus was very small, the accumulation of genetic information was useful for detecting a high-risk group for the disease in a population. However, the area under the receiver operating characteristics curve for the GRS was 0.624, and the effect of adding the GRS to clinical factors (age, sex and body mass index) was as small as 0.03, although the incremental effect was statistically significant68. The performance of genetic prediction models using the GRS has been evaluated in >30 studies, including in Asian and European case–control study sets or prospective cohorts69. The results were consistent among these studies, including our study68; area under the receiver operating characteristics for genetic information alone for type 2 diabetes were 0.579–0.641, and the incremental predictive performance of type 2 diabetes using established markers was statistically significant, but limited69. Thus, initial GRSs had only limited success, providing insufficient risk stratification for clinical utility, which suggests that a much larger GWAS and improved algorithms are required.

Currently, polygenic risk scores (PRSs) are the mainstream scoring method for disease risk prediction. PRSs are calculated as the weighted count of numerous risk variants on individuals' genotype data or summary statistics of a single large-scale GWAS by ignoring the stringent genome-wide significance threshold (P < 5 × 10−8). Although compiling variants that achieve genome-wide significance ensures the exclusion of false positive risk alleles, such a stringent threshold (P < 5 × 10−8) ignores many other variants with true associations that have escaped detection at genome-wide significance owing to limited sample sizes. The small cumulative effects of numerous truly associated variants are expected to contribute to the overall score and improve its statistical power.

A direct comparison of the predictive accuracy of a conventional GRS (based on 199 genome-wide significant SNPs) and PRS (built from ~170,000 SNPs) based on type 2 diabetes GWAS data from the UK Biobank as the training set (n = 455,313) and test set (n = 13,480 cases and 311,390 controls) showed that the accuracy of risk estimation was greater in the PRS (area under the receiver operating characteristics: GRS = 0.62, PRS = 0.66), but its increase in performance was likely small (Table 4)70. The observation by Meigs, who compared three genetic scores for the risk of coronary artery disease with 50, ~50,000 and ~6.6 million variants, also suggested that adding variants would not alter the basic predictive capacity of the score, although a few more extremely high-risk individuals would be identified71. In 2018, Khera et al. developed a PRS constructed with 7 million variants based on the results of a large-scale European type 2 diabetes GWAS (n = ~160,000) (Table 4)33, 72. The individuals in the top 3.5% of the PRS had a threefold greater chance of having type 2 diabetes compared with the rest of the population (96.5%) in the UK Biobank, indicating that the PRS can now identify a larger fraction of the population with disease susceptibility than found by rare monogenic mutations with a comparable disease risk72.

Table 4. Comparison of risk predicting scores for type 2 diabetes
Study
Khera et al., 201872 Mahajan et al., 201834
Udler et al., 201970
Base GWAS data (training set)
No. cases 26,676 55,005
No. controls 132,532 400,308
Reference Scott et al., 201733 Mahajan et al., 201834
Risk score model
No. SNPs in risk score 6,917,436 171,249 199
Methods Ld-pred Pruning and thresholding Conventional GRS
P-value threshold 0.1 5 × 10−8
Tuning parameter ρ = 0.01
Testing dataset
No. cases 5,853 13,480
No. control 288,978 311,390
Reference UK Biobank UK Biobank
ROC_AUC in testing data set
No covariate 0.64 0.66 0.62
Adjusted for age and sex 0.73 0.73 0.72
OR of top 5% bin vs remainder population in testing dataset
No covariate 2.75 2.75
Adjusted for age and sex 4.52
Type 2 diabetes prevalence in top 2.5% bin of risk score in testing data set
No covariate 12% 10%
Adjusted for age and sex 20% 18%
  • Modified citation form reference Udler et al., 201970.
  • AUC, area under the curve; OR, odds ratio; ROC, receiver operating characteristic curve; SNPs, single-nucleotide polymorphisms.
  • P-value reflects the P value threshold used for selecting variants from the discovery genome-wide association studies (GWAS) to include into the risk score.
  • For the LDpred algorithm, the tuning parameter ρ reflects the proportion of variants assumed to be causal for the disease.

The predictive accuracy of a PRS largely depends on the sample size and study population of the base GWAS data, for which summary statistics are utilized as ‘a training set’. The sample size needed to identify SNPs that can explain 80% of GWAS heritability is estimated to be approximately between 500,000 and 1 million for most common adult-onset chronic diseases73. However, it was also reported that the marginal utility of additional samples can be quite small after the size of a GWAS reaches 100,000–200,000 subjects74. Therefore, a sample size of at least 100,000 is required for the base GWAS set to ensure the prediction accuracy of the type 2 diabetes PRS.

As aforementioned, most genetic studies to date, including type 2 diabetes GWAS, have been mainly undertaken in European populations, and the sample size of the largest Japanese type 2 diabetes GWAS is approximately one-fifth of the size of the largest European sample34, 48. Symmetrical comparisons between PRSs using European (UKBB) or Japanese (BBJ) GWAS as training sets have shown that a PRS based on a European GWAS predicted type 2 diabetes risk in the Japanese population less accurately than a PRS based on a Japanese GWAS, and vice versa75. Notably, the lower accuracy of PRS based on European GWAS for prediction in the Japanese population was most evident in type 2 diabetes and body mass index among examined diseases or anthropometric and blood panel traits75. This observation suggests that the currently available PRS based on large-scale European GWAS or transethnic GWAS, which consist mainly of European participants, might be less useful for non-European populations, especially in metabolic traits, such as type 2 diabetes and obesity. The single most important step toward parity in PRS accuracy across all ethnic groups is to increase the diversity of participants included and analyzed in genetic studies, which would improve the utility of the PRS for all ethnic groups. In addition, ongoing methodological developments in cross-population polygenic prediction by jointly modeling GWAS summary statistics from multiple populations might help considerably76-78.

FUTURE PERSPECTIVES

Currently, >700 type 2 diabetes susceptibility loci have been identified79. In addition, a multi-ancestry type 2 diabetes GWAS in 2.5 million individuals has been carried out (Table 5)53, which is expected to uncover hundreds of additional signals. Although this is excellent progress, it is also recognized that the information obtained from GWAS is still insufficient for clinical applications.

Table 5. A list of representative large-scale genetic cohort studies for type 2 diabetes (effective sample size >5,000)
Study acronym Study name PMID Country of origin Ancestry group Sample size
Cases Controls
Multi-ethnic
BIOME BioMe Biobank 21573225 USA AFR 2,123 4,342
HIS 3,299 6,238
EAS 108 710
EUR 860 7,218
SAS 195 404
BIOVU BioVU Biobank 18500243 USA AFR 3,269 6,494
EUR 15,458 27,328
GERA Resource for Genetic Epidemiology on Adult Health and Aging 27189021 USA AFR 444 1,391
EUR 6,961 13,922
MGB Mass General Brigham Biobank 26784234 USA AFR 432 1,650
HIS 299 1,919
EUR 2,530 31,769
MGI Michigan Genomics Initiative N/A USA AFR 779 2,205
EUR 7,426 33,436
MVP VA Million Veteran Program N/A USA AFR 27,680 61,819
HIS 11,877 29,606
EAS 1,813 5,681
EUR 90,107 247,358
SAS 97 411
PMBB Penn Medicine Biobank N/A USA AFR 3,374 7,235
EUR 5,136 24,058
UKBB UK Biobank 25826379 UK EUR 19,119 423,698
SAS 1,404 6,849
WHI Women's Health Initiative 9492970 USA AFR 2,481 5,771
HIS 282 3,205
European
DECODE deCODE Genetics 24464100 Iceland EUR 11,448 278,375
EPIC-INTERACT European Prospective Investigation into Cancer and Nutrition 21717116 Various EUR 9,308 11,523
ESTBB Estonian Biobank 24518929 Estonia EUR 12,208 183,863
FINNGEN FinnGen Study N/A Finland EUR 37,002 215,160
GODARTS Genetics of Diabetes and Audit Research in Tayside Scotland 9329309 UK EUR 2,993 2,641
UCPH Danish type 2 diabetes case–control study 23160641 Denmark EUR 5,220 18,556
East Asian
BBJ Biobank Japan 34594039 Japan EAS 45,383 132,032
CKB China Kadoorie Biobank 16131516 China EAS 10,741 85,289
KBA Korean Biobank Array from the Korean Genome and Epidemiology Study (KoGES) Consortium 28938752, 30718733 Korea EAS 9,578 84,113
South Asian
LOLIPOP London Life Sciences Prospective Population 18454146 UK SAS 2,716 6,027
PROMIS Pakistan Risk of Myocardial Infarction Study 19404752 Pakistan SAS 6,993 10,304
Hispanic
HCHS/SOL Hispanic Community Health Study/Study of Latinos 28254843 USA HIS 2,460 5,185
SIGMA Slim Initiative in Genomic Medicine in the Americas 24390345 USA HIS 6,664 6,970
African
WFSM Wake Forest School of Medicine 22238593, 15642484 USA AFR 3,926 2,672
  • Genetic cohort studis which contributed to the latest multi-ethnic GWAS for type 2 diabetes (Suzuki et al.53) with effective sample sizes >5,000 are listed. AFR, African; EAS, East Asian; EUR, European; HIS, Hispanic; SAS, South Asian.

Efforts to fill in the missing heritability data through further expanded analyses need to be continued. Additional type 2 diabetes loci will be discovered mostly by continued increases in the size and diversity of common-variant GWAS, and incorporating new populations and resources for genotype imputation. Collaborative networks and centralized analysis plans of GWAS consortia play key roles in the organization of these efforts. Furthermore, data and analyses must be made available through public portals. In addition, it is important for the biomedical community to ensure that all ethnic groups have access to genetic risk prediction of comparable quality, which requires undertaking or expanding GWAS in non-European ethnic groups.

Another focus of ongoing research efforts includes detailed functional characterization of the identified type 2 diabetes susceptibility variants. For common diseases, GWAS have provided valuable opportunities for drug discovery; nevertheless, drug discovery based on GWAS data remains challenging because of the difficulty in functional annotations. Activation of laboratory functional validation is required to identify causal transcripts and understand the molecular mechanisms at GWAS loci.

Suitable clinical implementation of the PRS is now an area of active research across many disease areas, including type 2 diabetes80, 81. Given that germline genomic sequences never change throughout life, the results of a single genetic examination are relevant throughout an individual's lifetime. In addition, PRSs can be calculated simultaneously for many diseases based on data from a single existing genotyping array, although identifying rare monogenic mutations requires the sequencing of specific genes. It has been reported that homozygous carriers of the TCF7L2 risk allele conferred an 80% increased risk of developing diabetes in individuals without intervention in the Diabetes Prevention Program, whereas this effect was canceled in individuals who were randomized to the lifestyle intervention arm66. This is a good example of the clinical usefulness of genetic testing to allow detection of high-risk individuals with whom physicians should aggressively intervene. One study reported that genetic testing would be an incentive to change health behaviors; a ‘high risk’ result from genetic testing would inspire 71% of the 152 healthy individuals interviewed to adopt healthy lifestyle changes82, whereas a randomized, controlled trial showed that genetic screening for type 2 diabetes, coupled with genetic counseling, did not change health behavior or type 2 diabetes outcomes83, 84, suggesting that genetic information alone seemed insufficient as a means to reduce the risk of type 2 diabetes, even in very-high-risk individuals. To maximize the value of a PRS in type 2 diabetes prevention, the results of genetic testing might need to include an easy-to-understand interpretation of the PRS, and sustained lifestyle interventions based on individual genetic risks should be provided by healthcare professionals.

CONCLUSIONS

GWAS have produced significant breakthroughs in the field of common disease genetics; however, it has been challenging to translate GWAS findings into clinical practice to improve the care of patients with diabetes. The focus of ongoing research efforts should include improving the disease prediction accuracy of PRSs, detailed functional characterization of the identified type 2 diabetes susceptibility variants and the search for missing heritability. Finally, it should be emphasized that genetic scientists should collaborate with clinicians and healthcare professionals to successfully utilize genetic information to improve the prediction of disease onset and inform care decisions for patients, and to facilitate the implementation of precision medicine.

ACKNOWLEDGMENT

This work is supported by a Grant from the Okinawa prefecture for promoting collaborative research of innovation and eco system.

    DISCLOSURE

    The authors declare no conflict of interest.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.