GWAS advancements to investigate disease associations and biological mechanisms
Abstract
Genome-wide association studies (GWAS) have been instrumental in elucidating the genetic architecture of various traits and diseases. Despite the success of GWAS, inherent limitations such as identifying rare and ultra-rare variants, the potential for spurious associations and pinpointing causative agents can undermine diagnostic capabilities. This review provides an overview of GWAS and highlights recent advances in genetics that employ a range of methodologies, including whole-genome sequencing (WGS), Mendelian randomisation (MR), the Pangenome's high-quality Telomere-to-Telomere (T2T)-CHM13 panel and the Human BioMolecular Atlas Program (HuBMAP), as potential enablers of current and future GWAS research. The state of the literature demonstrates the capabilities of these techniques to enhance the statistical power of GWAS. WGS, with its comprehensive approach, captures the entire genome, surpassing the capabilities of the traditional GWAS technique focused on predefined single nucleotide polymorphism sites. The Pangenome's T2T-CHM13 panel, with its holistic approach, aids in the analysis of regions with high sequence identity, such as segmental duplications. MR has advanced causative inference, improving clinical diagnostics and facilitating definitive conclusions. Furthermore, spatial biology techniques such as HuBMAP enable 3D molecular mapping of tissues at single-cell resolution, offering insights into pathology of complex traits. This study aimed to elucidate and advocate for the increased application of these technologies, highlighting their potential to shape the future of GWAS research.
1 INTRODUCTION
Genome-wide association studies (GWAS) are a pioneering approach in genetics that employ various statistical frameworks, focusing on the identification of genetic variations associated with specific phenotypes. It leverages genetic data from patients and healthy individuals to compare variations in their DNA with differences in the trait or disease. By analysing millions of genetic markers spread across the genome of these individuals, GWAS pinpoint regions of their DNA that may lead to the development or progression of the disease.1, 2
Since its advent in the early 2000s, when the first seminal study examining the complexities of myocardial infarction surfaced,3 more than 6000 publications related to GWAS have been produced. These studies were centred on making significant strides in the field of genomics, thus recognising GWAS as transformative in understanding the behaviour and biology of DNA.4 Through its comprehensive approach, including gene sequencing, rigorous statistical analysis and validation studies, GWAS have successfully improved understanding of gene‒disease‒environmental associations.1 Furthermore, it has been leveraged to predict the heritability of complex traits,5 reconstruct population histories,6, 7 improve the field of forensics,8 perform DNA fingerprinting of embryos9 and most significantly contribute to a deeper comprehension of the genetic architecture of diseases. However, like many groundbreaking endeavours, it possesses various limitations, including missing heritability, population specificity, complex traits, limited functional understanding, ethical considerations, sample size requirements, multiple testing issues, spurious associations and identification of causative agents.10 Furthermore, ongoing challenges, such as the need to diversify GWAS datasets, refine polygenic rish score (PRS) methodology, unravel the biological mechanisms underlying GWAS loci, integrate rare variant data and investigate sex-specific associations, underscore the continued efforts required to translate genetic discoveries into clinical applications for personalised risk prediction, prevention and treatment of multiple traits and diseases.10
By conducting a comprehensive review, this paper aims to enhance the future of GWAS research by exploring multiple techniques with an established success in elucidating the genetic underpinnings of complex traits and diseases. We further expand on sequencing methodologies, specifically whole-genome sequencing (WGS), epidemiology initiatives such as Mendelian randomisation (MR) and its accompanying techniques, and high-quality genome assemblies such as the Pangenome's Telomere-to-Telomere (T2T)-CHM13 and the Human BioMolecular Atlas Program (HuBMAP). WGS, a prominent technology in genetics, addresses the issue of ‘missing heritability’ in GWAS by providing complete DNA analysis or decoding. On the other hand, MR is a post-processing initiative capable of clarifying the role of identified variants, thereby minimising spurious associations and revealing actual causative agents (Figure 1).

Pangenome's T2T-CHM13 high-quality phased genome assembly was also examined. Unlike the reference assemblies (GRCh38, 1000 Genomes Project [1KG], GRCh37, etc.) used in GWAS, it considers diverse haploid genomes and provides complete species genetic diversity, which enables the analysis of largely inaccessible complex genomic regions such as segmental duplications (SDs). SDs have been extensively overlooked in the past due to their complexities and high-sequence identity.11 Finally, we explored how spatial biology technology, such as HuBMAP, represents an ambitious initiative to create spatial maps of the human body at the cellular level.12
2 GWAS AND WHOLE-GENOME SEQUENCING
To examine potential gene‒disease associations that may occur in the genome, the first step is to choose the right and appropriate sequencing technology. Current sequencing methodologies available on the market include Sanger sequencing known as first-generation sequencing13; next-generation sequencing technologies encompassing both the second generation for short-read (e.g., Illumina and Ion Torrent) and third-generation for long-read (e.g., Oxford Nanopore and PacBio SMRT)14; exome sequencing, which focuses on sequencing the protein-coding regions of the genome,15 single nucleotide polymorphism (SNP) array16 and WGS.17-19 Among the plethora of techniques, the most commonly used, especially in large-scale genetic studies and GWAS, is the SNP array.20, 21 These arrays are particularly valuable for genotyping SNPs across the genome in a cost-effective and high-throughput manner.22 They have been pivotal in identifying genetic variations associated with diseases, traits and population diversity. However, these methods have inherent limitations, such as limited resolution, introduction of ascertainment bias, restriction to genotyping only predefined SNP sites, inability to capture structural variants (SVs), genotype imputation challenges and limited information on functional impact.20, 22, 23 All these factors contribute to the missing heritability observed in gene‒disease associations.20 Nonetheless, WGS has emerged as a transformative solution to these challenges (Table 1).17, 24
Aspect | WGS | SNP array |
---|---|---|
Genetic variability | Captures the entire spectrum of genetic variations, including SNPs, indels, structural variants and rare mutations | Primarily focuses on predefined SNP sites, suitable for common genetic variants |
Resolution | Offers high resolution, identifying millions of SNPs, structural variants and rare mutations | Provides relatively lower resolution compared to WGS, targeting a limited number of SNPs |
Ascertainment bias | Reduces ascertainment bias as it captures dispersed nucleotide alterations and rare variants | Prone to ascertainment bias due to limited SNP coverage, potentially missing rare variants |
Data completeness/coverage | Provides data for the entire genome, encompassing ∼2.8 × 109 nucleotide positions | Covers ∼2.5 × 106 nucleotide positions in the autosomal genome |
Discoverability | Highly effective in detecting rare and de novo mutations contributing to rare genetic disorders | Less effective in capturing rare variants, often underrepresented in the data |
Cost and scalability | Typically, more expensive due to comprehensive sequencing | Cost-effective and scalable for large populations due to targeted SNP genotyping |
Impact on heritability | Addresses missing heritability observed in GWAS | Limited impact on missing heritability |
Functional information | Provides functional insights into genetic variations | Limited information on functional impact |
Research flexibility | Enables diverse genomic analyses | Primarily used for genotyping purposes |
Contribution to Pangenome's analysis | Supports Pangenome's analysis by capturing genetic diversity beyond linear reference genomes | Typically, not used in Pangenome's analysis, as it focuses on specific SNPs |
- Note: This table outlines the comparative strengths and limitations of WGS and SNP arrays in genetic research. WGS offers comprehensive coverage of the genome, high resolution for identifying variants and insights into functional genomics. In contrast, SNP arrays are cost-effective for large-scale genotyping studies but have limited resolution and coverage. The choice between these technologies depends on the research objectives, budget and desired level of genetic detail.
WGS is a comprehensive genetic analysis technique that involves sequencing the entire DNA content of an organism, providing detailed information about its complete genome and capturing the full spectrum of genetic variations, including SNPs, insertions and deletions (indels) and SVs.25 This technology provides data for the entire length of the genome, which is approximately 2.8 billion nucleotide positions, unlike the SNP array, which covers roughly 2.5 million nucleotide positions in the autosomal genome.26 Its use in genetic research significantly impacts comprehension of multiple disease traits while also transforming the landscape of precision medicine.27, 28 The only disadvantage it presented was its exorbitant cost, which impacted its broader application. From the 1990s to early 2000s, the cost of sequencing a single human genome often ranged from hundreds of millions to billions of dollars. The Human Genome Project, completed in 2003, costs approximately $2.7 billion.29 Advances in genomics significantly reduce the cost of genome sequencing from millions of dollars to approximately $600. This dramatic cost reduction democratises access to genomic data, enabling large-scale initiatives such as the UK Biobank, AllofUs Research Program30, 31 and Genomics England's 100 000 Genomes Project32, 33 to undertake WGS on tens of thousands to millions of participants.29 Notably, the UK Biobank stands out as the most prominent and influential due to its large-scale, longitudinal design and extensive dataset. It also holds the potential to revolutionise precision healthcare research and personalised medicine, given its vast repository of genetic and health data from over 500 000 individuals.10, 34 Moreover, this field has taken the initiative to increase active users by introducing different coverage levels. The choice of coverage depends on the specific goals and requirements of the project in question. For cost-effective screening, population genetics studies, epidemiological research, initial or pilot studies and instances with resource constraints, low-cost WGS (LCWGS) can be a pragmatic choice. In contrast, projects demanding precision, such as clinical diagnostics, identification of rare or ultra-rare variants, study of complex genomic regions or research involving pharmacogenomics, benefit from the depth and accuracy offered by high-cost WGS (HCWGS).26, 35 Regardless of the choice of coverage, LCWGS and HCWGS will contribute to enhanced DNA reading and analysis when compared to other genotyping sequencing technologies, such as targeted sequencing or SNP arrays.26 This is attributed to their capacity to reveal the ‘missing heritability’ by compensating for gaps in coverage that naturally stem from other techniques.24
In a study delving into the efficacy of WGS, Ceballos et al.26 compared different variant strategies, specifically the commonly used ones. The research opted for the LCWGS for the examination. LCWGS identified about 6.3 times as many heterozygous SNPs as the SNP array. Illustratively, while the SNP array identified roughly 405 000 SNPs, LCWGS uncovered approximately 2.55 million SNPs in the same study, highlighting its superior sensitivity in capturing genetic variations.26 Comparatively, HCWGS outperforms LCWGS, which is expected due to differences in coverage levels. The data from a recent large-scale study32 leveraging data from the 1000 Genomes Project (1kGP) cohort revealed the substantial impact of HCWGS on variant discovery and precision when compared to LCWGS. It was reported that the leveraged LCWGS data at 7.4× depth coverage provided roughly 84.7M SNVs, 3.6M indels and 68K SVs. However, HCWGS at 30× surpassed all that, providing more SNVs (approximately 111M), indels (14.4M) and SVs (173K). When tested against long-read sequencing techniques to validate its accuracy, it was found consistent. It revealed more mutations in genes per genome compared to LCWGS. Moreover, it was found to be precise, with a lower false discovery rate (0.1% for SNVs and 1.1% for indels) compared to LCWGS (0.6% for SNVs and 12.4% for indels).32 These figures collectively showcase the versatility of WGS as a transformative tool that surpasses conventional sequencing methods, as it stands in a league of its own, showcasing unparalleled capabilities and rendering it a distinctive and unrivaled asset in genomics.
In the context of GWAS and WGS, it is essential to acknowledge the diverse sources of DNA used for analysis, as they can significantly impact the interpretation of results and the identification of disease associations. The type of DNA source, whether it is blood, tissue, cancer samples, mitochondrial DNA (mtDNA) or other omic data, presents unique challenges and considerations in genetic analysis. Blood and tissue DNA are commonly utilised in population-based studies for GWAS and WGS. These samples offer insights into germline genetic variations and are instrumental in understanding the genetic basis of complex traits and diseases.36 However, variations in tissue-specific gene expression and epigenetic modifications can influence the interpretation of genetic associations, necessitating careful consideration in study design and analysis.37 In cancer genomics, the analysis of tumour DNA poses additional challenges due to somatic mutations, tumour heterogeneity and clonal evolution.38 Although GWAS and WGS have provided valuable insights into the genetic drivers of cancer, distinguishing between germline and somatic mutations is crucial for identifying potential therapeutic targets and understanding disease progression.39 Additionally, mtDNA merits special attention due to its separate genetic inheritance pattern and unique role in cellular metabolism. Analysis of mtDNA can reveal insights into mitochondrial diseases, maternal inheritance patterns and mitochondrial dysfunction underlying various health conditions, including neurodegenerative disorders and metabolic syndromes.40 Each DNA source presents its own set of analytical challenges, including sample quality, purity and representativeness of the biological system under study. Furthermore, factors such as sample size and diversity across ancestral backgrounds must be carefully considered to ensure robust and generalisable genetic findings. Addressing these complexities requires tailored methodological approaches, such as optimised sequencing techniques, data integration strategies and statistical models that account for the specific characteristics of each DNA source. Emerging omic technologies, such as epigenomics, transcriptomics, proteomics and metabolomics, offer complementary layers of information to traditional genetic analysis.41, 42
The current lack of highly effective treatments for diseases is attributed to a limited understanding of their pathogenesis. Past GWAS aimed at understanding disease progression have mainly used SNP chips,43 or imputation techniques based on reference panels such as HapMap, 1KG or Haplotype Reference Consortium (HRC),44 all of which have been sparingly insightful in revealing novel associations with functional significance. However, more studies are now incorporating WGS, as it has become cheaper and more accessible. A study comparing methods for measuring mtDNA copy number (mtDNA-CN) across 4500 participants found that WGS outperformed traditional methods such as quantitative real-time polymerase chain reaction, which is the current gold standard for measuring mtDNA-CN. WGS-derived mtDNA-CN showed stronger associations with known correlates of mitochondrial function and aging-related diseases, underscoring its precision and sensitivity in uncovering intricate genetic associations.44 It also offers insights into disease mechanisms and suggests the need for more accurate mtDNA-CN assessment methods.44 A more recent study employing WGS in atrial fibrillation (AF) investigation successfully identified a recessive frameshift deletion in the MYL4 gene, a novel association that was linked to early-onset AF. Several mutations in the ABCB4 gene that contribute to an increased risk of liver diseases were also discovered.45 The identification of a frameshift deletion in MYL4 suggested a potential disruption in cardiac muscle function, contributing to the pathogenesis of AF, involving complex interactions between genetic and environmental factors, especially in early-onset cases.45 These factors can include abnormalities in the autonomic nervous system, calcium handling abnormalities, structural remodelling and electrical remodelling.46 All of which can exacerbate risks of disruption in the normal cardiac rhythm, and potentially result in comorbidities, such as the development of heart failure (HF) and stroke.47 Another study utilising WGS data from individuals with Alzheimer's disease and cognitively normal controls identified two novel rare variants associated with the disease: a missense variant in OR51G1 (p.R272H) and a stop-gain variant in MLKL (p.Q48X).48 The discovery of the p.R272H variant suggested a potential role for OR51G1 in Alzheimer's disease pathogenesis, possibly through altered olfactory signalling or neuronal function. MLKL is a key regulator of necroptosis, a form of programmed cell death implicated in neurodegenerative diseases. The identification of this variant provides further insights into the molecular mechanisms underlying Alzheimer's disease and highlights MLKL as a potential therapeutic target. In addition, a recent study aimed at understanding genetic loci associated with inflammatory biomarkers leveraged WGS and discovered 18 novel associations that eluded detection with genotyped or imputed SNPs, suggesting the added value of WGS in uncovering previously unidentified genetic loci.17 In neurodevelopmental disorder studies, WGS identified rare and de novo variants underlying conditions such as autism49, 50 and intellectual disabilities.51 Cancer genomics has also benefitted from WGS, as it enabled comprehensive exploration of somatic mutations.27, 52 When used with integrated haplotype-resolved panels, WGS further improves imputation accuracy across the allele frequency spectrum and facilitates the detection of rare and low-frequency variant associations.53-56 These studies collectively showcase that WGS offers superior sensitivity in capturing genetic variations compared to traditional methods, reveals novel disease associations with functional significance, and provides deeper insights into disease mechanisms across various complex diseases and conditions. Deductively, due to the holistic and unbiased approach of WGS, its integration is set to expand the scope of future GWAS and enable a more nuanced understanding of the genetic basis of complex traits and diseases.
3 GWAS AND MENDELIAN RANDOMISATION
MR is a method in genetic epidemiology that utilises genetic variants as instrumental variables to explain the causal relationship between an exposure (e.g., a risk factor or intervention) and an outcome in observational studies.57 This analysis typically employs statistical techniques such as two-stage least squares regression,58, 59 inverse-variance weighting60 and or more.61-63 MR's quasi-experimental approach leverages the random assortment of genetic alleles during gamete formation, akin to Mendel's law of inheritance, to mimic a randomised controlled trial and infer causation.57, 64 This method is particularly advantageous over traditional observational studies, which are often susceptible to issues such as reverse causation and confounding, thereby providing a more robust and less biased assessment of causation.65, 66 GWAS, on the other hand, have been instrumental in identifying genetic variants associated with various traits and diseases. These studies have uncovered hundreds of common variants whose allele frequencies statistically correlate with certain phenotypes. However, the mere identification of these associations does not confirm the biological relevance of these variants to disease pathogenesis or their clinical utility in prognosis or treatment.67 Although GWAS are crucial for highlighting potential genetic factors linked to diseases, most identified variants require further investigation to understand their role in disease causality.
Therefore, while GWAS lays the groundwork by identifying genetic variants associated with diseases, MR is needed to assess the causality of these associations. By using the genetic variants identified through GWAS as instrumental variables, MR can provide insight into whether these associations reflect causal relationships. This sequential approach underscores the importance of both methods in moving from association to causation, thereby necessitating deeper scrutiny of the role of common variants as causative agents in disease. Recent studies leveraging MR showcase its statistical power in revealing certain underlying factors that increase disease risks. Within the context of AF, MR surveys have accentuated the spectrum of risk factors, ranging from obesity68 and obstructive sleep apnoea (OSA)69 to even a 1-unit standard deviation increase in childhood body mass index,68 birth weight68 and height70 all of which indicate an elevated risk of AF.71 Similarly, when leveraged to understand the aetiology of HF, it reveals a spectrum of risk factors, including higher diastolic and systolic blood pressures, elevated triglyceride levels, genetic predisposition to AF and coronary artery disease as significant contributors to heightened HF risk. These examples emphasise MR's insight in illuminating genotype‒phenotype connections, as they provide a nuanced understanding of the causal relationships underlying cardiovascular diseases (CVDs), which can be used for personalised treatment strategies. Notably, MR's statistical significance stretches beyond causal inference in genotype‒phenotype connections. It also extends to unveiling shared pathophysiology between two diseases. For instance, these include depression and type 2 diabetes,72 rheumatoid arthritis and myocardial infarction,73 and AF and HF.74 Statistically, it is depicted that nearly two-third of people living with AF, irrespective of its aetiology, will subsequently develop HF as their condition progresses, whereas AF arises in only one-third of those with pre-existing HF.75, 76
MR is pivotal in unraveling underlying risk factors and elucidating disease aetiology. However, it has inherent limitations including assumptions about instrument validity and potential for pleiotropy when assessing causality. Fortunately, various MR methods, such as CAUSE61 and MRMix,77 explicitly address horizontal pleiotropy. This approach helps control for both exposure-independent paths and unobserved exposure-outcome confounding. MR techniques such as block jackknife resampling,63 pseudo replication, omni genic MR78 and Bayesian modelling79 also address challenges such as sample overlap, winner's curse, omni genic architectures and reverse causation, respectively, that can arise in studies. Furthermore, these strategies, combined with sensitivity analysis, enhance the reliability of MR findings in large-scale genomic studies.57, 80 Evidently, a more congruent application of MR in post-analysis GWAS research will revolutionise clinical diagnostics by informing about biological relevance of disease associated variants or genes, causality inference, disease susceptibility and prevention strategies.
4 SEGMENTAL DUPLICATIONS ELUCIDATED IN GWAS THROUGH PANGENOME'S T2T-CHM13 PANEL
Understanding associations or spectrum of variants in genomic studies is crucial for informed decisions in clinical settings. However, challenges, such as identifying or accurately genotyping variants due to their inherent complexities, can affect this goal.81 As a result, genetic variations such as SVs, including indels, translocations, copy number variations and duplications, have been relatively understudied in GWAS.81 Previous explorations aimed at being inclusive of SVs, specifically SDs to understand their role in diseases have been ineffectual. This is because these regions are plagued with inherent complexity and high sequence identity.82-84 One example of such complexity is evident in the interplay between the cytokine gene CCL3L1 located within the q-arm of chromosome 17 and the chemokine receptor CCR5 located on the short p-arm of chromosome 3. The CCR5 is known for its role in HIV infection. It serves as a coreceptor along with CD4 for the entry of the most common strains of HIV-1, the virus that causes AIDS, into host cells. On the other hand, the CCL3L1 gene plays a significant role in the body's defense against HIV-1. Due to SDs, individuals carry varying numbers of the CCL3L1 gene, ranging from zero to six copies. This gene produces a substance that blocks HIV-1 from entering cells by interfering with the CCR5 coreceptor, effectively acting as a barrier against the virus. The number of CCL3L1 copies an individual possesses directly influences their susceptibility to HIV-1; more copies correlate to better protection, highlighting the significant impact of genetic variability on disease resistance and progression.83 Clearly, the intricate interplay between these genes emphasises not only the importance of the chemokine system but also the importance of studying and understanding the implications of SDs in genetic research.
SDs are regions of the genome that have been duplicated, resulting in multiple copies of the same genetic material with over 90% sequence identity and over 1 kb in length.84 However, these ‘homologous regions’ are difficult to analyse, as they result from duplications rather than retro transposition. Despite their importance in evolution and as risk factors for genomic rearrangements leading to disorders due to unequal crossing over in meiosis, they have been extensively overlooked since early genome sequencing methods incorrectly assembled certain genomic regions with high sequence similarity.83, 84 This has led to the exclusion of high-identity SDs from subsequent analyses. Although the human reference genome (GRCh38) provides a roadmap for SDs, more than 50% of the remaining gaps consist of more complex SD regions.85, 86 Recent advancements in the field of Pangenome's analysis, involving high-quality phased genome assemblies such as the T2T consortium have further resolved the issue of analysing these complex regions.84, 85 The Pangenome's concept demonstrates a promising strategy to surmount the limitations of GWAS stemming from the traditional reliance on a singular linear reference genome, such as GRCh38 assembly, NCBI Build 37 (hg19), African Reference Genome (AR1), and reference panels such as those from the 1KG and the HRC.87, 88 This approach involves assembling a collection of individual haploid genome sequences that represent the genetic diversity of a population or species88 and encompassing both the core genome, which is shared among all individuals, and the accessory genome, which is specific to certain individuals or subgroups.88 This facilitates the identification of novel genetic associations.89 Pangenome's analysis has significantly lowered false mapping rates, improved allelic bias and enhanced the accuracy of gene expression analysis compared to conventional linear reference pipelines such as STAR90 or VG MPMAP.91 By capturing a broader range of genetic variations, including rare variants and SVs, the Pangenome enables the inclusion of tens of thousands of additional alleles in future GWAS89 and the illumination of the implications of SDs in disease development and progression.
Intertwined with Pangenome, the first complete human genome (T2T-CHM13) provides a comprehensive reference panel to enhance the Pangenome's inclusivity and accuracy in gene identification, and better understand human genomic variations.84 By leveraging this novel panel, coupled with multiple statistical analyses, researchers can compare regions with unique genetic information to regions with duplications.84 When compared to widely used reference panels in GWAS such as GRCh38, the comprehensiveness of T2T-CHM13 undeniably provides contiguous representation of SD regions. The T2T-CHM13 revealed 81 million base pairs of previously unresolved or structurally variable SDs, increasing the genome-wide estimate from 5.4% to 7.0%.85 The holistic approach of the T2T-CHM13 panel makes it capable of enlightening SD regions and illuminating other genomic regions, such as acrocentric, pericentric and telomeric regions.85 These findings underscore the T2T-CHM13 panel as an invaluable asset for genomic studies, specifically GWAS, offering a more comprehensive and accurate representation of the human genome's complexity compared to previously used reference panels in GWAS.
5 HuBMAP—A SPATIAL BIOLOGY INITIATIVE
The HuBMAP is an NIH-funded program that aims to accelerate the understanding of human biology by developing a framework for creating a three-dimensional map of the human body at the cellular level.92 HuBMAP is part of the broader field of spatial biology, which involves studying the organisation and interactions of cells and molecules in tissues and organs to understand biological processes in health and disease.92 By creating high-resolution spatial maps of tissues, researchers can gain new insights into tissue heterogeneity, cell–cell communication and tissue microenvironments. HuBMAP is pioneering new techniques to map intact sections of cardiovascular tissues visually and computationally, providing unique insights into the pathophysiology of CVDs.93, 94 One major focus has been developing multiplex imaging methods that enable simultaneous visualisation of dozens of proteins across entire tissue sections. By applying these techniques to map proteins within samples of myocardium, blood vessels and lymphatics from both animal models and human patients, HuBMAP can reveal molecular heterogeneity at the single-cell level associated with disease states.94 Specifically, this high-dimensional spatial mapping has allowed for the interpretation of the unique arrangement and interactions between cardiac cell subtypes based on their protein signatures and three-dimensional microanatomical locations within the tissue.93 Additionally, HuBMAP has pioneered computational approaches to model cell–cell communication networks underlying cardiac conduction from spatial mapping data.93 These tissue modelling techniques can be used to identify pathological remodelling underlying conditions such as cardiomyopathy. The spatial molecular maps and computational tissue models generated by the novel imaging and modelling methods of HuBMAP provide invaluable resources. By integrating these data with large-scale genomic data, they give unique insights into the tissue-specific genomics of diseases that complement genetic findings.94
6 FUTURE OF GWAS—GENOME-DRIVEN APPLICATIONS
GWAS have profoundly influenced the landscape of genome-driven medication, spearheading precision medicine initiatives by uncovering genetic variants linked to disease susceptibility, drug response and treatment outcomes.95 These findings have translated directly into clinical applications, particularly in pharmacogenomics, where genetic insights guide personalised drug selection and dosing strategies. Notable examples include the identification of CYP2C9 and VKORC1 variants influencing warfarin metabolism, leading to tailored dosing regimens, and the discovery of CYP2C19 mutations affecting clopidogrel metabolism, guiding antiplatelet therapy choices.96, 97 In oncology, GWAS-driven insights have fueled targeted therapies such as epidermal growth factor receptor (EGFR) inhibitors for lung cancer patients with specific mutations and PARP inhibitors for breast and ovarian cancers linked to BRCA mutations, demonstrating the transformative impact of GWAS on precision oncology.98-101
By integrating tools such as WGS, MR and Pangenome's T2T-CHM13 panel, as well as multi-omics data, GWAS will unveil intricate disease mechanisms more comprehensively.102 This holistic approach promises to refine diagnosis and treatment selection, paving the way for more precise and personalised healthcare strategies. As technologies evolve and datasets expand, the applications of genome-driven medication will continue to expand, fostering a new era of tailored therapies and preventive interventions based on individual genetic profiles. Although this future significantly hinges on addressing the complexities posed by genetic data containing diverse ancestral backgrounds.103 As GWAS endeavour to augment sample sizes for heightened statistical robustness and inclusivity, the challenge of population stratification looms large.104 Fortunately, the existence of advanced methodologies such as principal component and propensity score and ancestry-informative markers can be deployed to accurately estimate ancestry and account for population structure.105 This strategic approach ensures the reliability and generalisability of GWAS findings across heterogeneous populations. As the scope of genetic studies broadens and diversifies, effective management of population stratification will be pivotal in fortifying the validity and clinical relevance of GWAS outcomes, ultimately advancing the prospects of personalised healthcare initiatives based on individual genetic profiles. The convergence of GWAS with identified and proposed emerging technologies holds great promise in reshaping the landscape of genomic medicine, driving innovation towards optimised patient care and disease management.
7 CONCLUSION
From rare traits to more prevalent ones, GWAS provides remarkable biological insights that transcend conventional methods of genetic interplay understanding. By utilising WGS and complete high-quality human genome panels such as the T2T-CHM13, we can better understand the genetic underpinnings of multiple traits or diseases. In addition, initiatives such as HuBMAP, which creates high-resolution spatial maps of human tissues through imaging and spatial transcriptomics, offer more insights on cellular heterogeneity and communication. Moreover, MR strengthens the determination of risk factors for diseases. This helps to draw definitive conclusions in clinical diagnosis and make better-informed decisions in healthcare. In essence, these proposed strategies allow for the identification of previously inaccessible variants in gene-rich portions of the genome, harbouring functional consequences and revolutionising GWAS research. We believe that widespread utilisation of these initiatives and others mentioned across the paper by researchers and scientists will revitalise and strengthen the statistical power of GWAS, subsequently enhancing disease understanding and clinical diagnosis.
AUTHOR CONTRIBUTIONS
Oluwaferanmi Omidiran and Aashna Patel have conducted the research reported and first drafted the manuscript. Oluwaferanmi Omidiran has revised it. Sarah Usman, Ishani Mhatre, Habiba Abdelhalim, William DeGroat, Rishabh Narayanan, Kritika Singh, and Dinesh Mendhe have supported this study. Zeeshan Ahmed has lead and guided the study.
ACKNOWLEDGEMENTS
We appreciate the great support by the Department of Medicine, Robert Wood John-Son Medical School; Rutgers Institute for Health, Health Care Policy, and Aging Research; and Rutgers Health, at Rutgers, The State University of New Jersey. Research reported in this publication was supported in part by the National Institute on Aging of the National Institutes of Health under award number R 33AG068931. The funding sources had no role in the design, collection, analysis, interpretation of the results or the decision to submit the manuscript.
CONFLICT OF INTEREST STATEMENT
The authors declare they have no competing financial or non-financial interests.
CONSENT FOR PUBLICATION
Not applicable.
ETHICS STATEMENT
Not applicable.
Open Research
DATA AVAILABILITY STATEMENT
Not applicable.