Volume 2, Issue 3 e47
REVIEW ARTICLE
Open Access

Big data and single-cell sequencing in acute myeloid leukemia research

Yuxuan Zou

Yuxuan Zou

Center for Hematology and Immunology, Cancer Center, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China

Contribution: Conceptualization (supporting), Writing - original draft (equal), Writing - review & editing (supporting)

Search for more papers by this author
Huiyuan Zhang

Corresponding Author

Huiyuan Zhang

Center for Hematology and Immunology, Cancer Center, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China

Correspondence Hongbo Hu and Huiyuan Zhang, Center for Hematology and Immunology, Cancer Center, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China.

Email: [email protected] and [email protected]

Contribution: Conceptualization (equal), Funding acquisition (equal), Writing - review & editing (equal)

Search for more papers by this author
Hongbo Hu

Corresponding Author

Hongbo Hu

Center for Hematology and Immunology, Cancer Center, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China

Chongqing International Institute for Immunology, Chongqing, China

Correspondence Hongbo Hu and Huiyuan Zhang, Center for Hematology and Immunology, Cancer Center, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China.

Email: [email protected] and [email protected]

Contribution: Conceptualization (equal), Funding acquisition (equal), Writing - review & editing (equal)

Search for more papers by this author
First published: 28 August 2023
Citations: 1

Abstract

The advancement of diverse technologies has led to a substantial increase in valuable biomedical data, particularly in the field of acute myeloid leukemia (AML). Effective utilization of this wealth of data is crucial for attaining a comprehensive and in-depth understanding of AML, thereby facilitating optimal diagnosis, treatment, and prognosis. Among the various approaches to data acquisition, single-cell sequencing has emerged as an impressive tool. The developments of single-cell sequencing methods have empowered researchers to analyze the genome, transcriptome, proteome, and epigenome data at the single-cell level. It also offers a means to uncover fine information, providing unique prognostic insights and aiding in the identification of therapeutic targets. Furthermore, it enhances our understanding of AML heterogeneity, clonal evolution, and resistance mechanisms, ultimately leading to the development of better treatment strategies. In this review, we present an overview of AML as well as single-cell sequencing technologies, then explore their potential contributions to AML research in different aspects, and provide some information about resources and data processing.

Graphical Abstract

Utilizing numerous amounts of different kinds of data from various sources can contribute to further understanding of acute myelogenous leukemia, playing a positive role in different aspects of research and applications to benefit patients, and the discoveries in different fields can even interact to provide one-way or mutual promotion as well. Among various data sources, single-cell sequencing is an emerging option that is powerful and impressive.

1 INTRODUCTION

Acute myeloid leukemia (AML) is a highly aggressive malignant disease originating from hematopoietic stem cells, characterized by the proliferation of abnormal myeloid blasts and the accumulation of immature progenitors in bone marrow, peripheral blood, and other tissues.1-4 AML is the most common type of acute leukemia in adults,5 accounting for 80% of all cases, with a discouraging 5-year survival rate of approximately 29%.2 Elderly AML patients have an even worse prognosis, with less than 15% long-term survival.6 Various factors contribute to the poor outcomes of AML patients, including a higher incidence of secondary or therapy-related AML, the presence of high-risk cytogenetic and molecular genetic features, and the presence of comorbidities that limit more intensive therapeutic interventions, such as allogeneic stem cell transplantation.7 Although new treatments have been introduced and there have been notable advancements in the long-term outcomes of patients in recent decades, AML still has a poor prognosis overall.2, 8 While young patients respond to therapy and achieve complete remission after induction chemotherapy,2, 8, 9 older patients exhibit distinct clinical courses, with the majority experiencing a relapse.2, 10 This discrepancy can be attributed to the significant heterogeneity of AML, which encompasses cytogenetic and molecular abnormalities along with individual factors such as physical conditions and comorbidities.11

The advent of big data has brought forth challenges to traditional data processing methodologies, primarily due to the massive scale of data, the need for rapid processing, data variety, and the requirement for high accuracy.12 In the biomedicine field, big data predominantly refer to genetic and omics data, although other data types, such as electronic health records, longitudinal data for predictive analysis as well as deep phenotyping, and multimedia clinical data, also have characteristics of big data.12 Modeling biological phenomena is often complicated and computationally intensive.12 To derive meaningful insights from the exponentially growing volume of data, it becomes imperative to employ technologies capable of efficiently and accurately processing large databases from a big data perspective.12 Such data could enable clinicians to simulate and model potential outcomes, offer improved treatments, and prevent patients from receiving ineffective treatments. Significant advancements will be made by accumulating and utilizing biological data to enhance our understanding of pathophysiological processes.

Indeed, the sheer magnitude of available data presents the challenge of distilling meaningful information that bears clinical benefits.13 Nonetheless, profound molecular insights offer the potential for switching traditional approaches to tumor diagnosis and therapy to precision oncology, which involves the amalgamation of genetic and genomic information for estimating prognosis and directing treatment choices.14 This paradigm shift has been explored in investigations focused on AML. To address the challenges posed by AML's heterogeneity and enhance patient outcomes, comprehensive insights into the disease are essential. The effective utilization of diverse data sources holds promise for advancing our understanding of AML, facilitating the development of comprehensive strategies to combat this devastating disease. The review provides a summary of recent studies investigating the use of single-cell sequencing (sc-seq) in AML, encompassing its impact on diagnosis, treatment, prognosis, and our understanding of AML itself. The objective is to present a comprehensive overview of the pivotal roles played by big data and sc-seq in advancing AML research, while also addressing their limitations, key challenges, and potential implications. The findings presented in this review highlight the significant contributions of these technologies toward enhancing our understanding of AML heterogeneity, identifying prognostic markers, uncovering therapeutic targets, and formulating more personalized treatment strategies. Moreover, the review sheds light on the challenges associated with handling large-scale data in AML research and discusses potential avenues for future exploration.

2 ACUTE MYELOID LEUKEMIA

AML represents a complex and heterogeneous group of diseases characterized by the accumulation of abnormal genetic alterations. These alterations could involve whole or parts of chromosomes, leading to copy number variants or fusion gene products, various insertions and deletions, and single-nucleotide variants.15 The molecular and genetic heterogeneity of AML contributes to its inconsistent responses to chemotherapy and allogeneic blood or marrow transplantation.16-18 Even among patients attributed into the same risk group based on their cytogenetic or molecular characteristics, their outcomes can vary widely.17, 19 Therefore, the stratification of AML patients and the availability of accurate treatment options are important topics of investigation.20

Recent advancements in molecular profiling have led to improvements in the stratification of AML patients, enhancing our ability to tailor treatment approaches.17, 21-23 Furthermore, the identification of biomarkers for early diagnosis, targeted therapy, prognosis prediction, identification of patients with poor prognosis, and recognition of oncogenic genes is essential.4 Understanding the pathogenesis of AML is crucial for enhancing existing therapeutic strategies and developing new ones. Key questions that need to be addressed include why certain patients develop resistance to treatment and why patients with similar clinical presentations exhibit variable responses to drugs.1 Achieving a better comprehension of the clonal composition and heterogeneity of AML can ultimately lead to more precise management strategies.

2.1 Heterogeneity

Cellular heterogeneity is a common feature of oncogenesis, particularly evident in AML. The presence of heterogeneous genetic mutations and epigenetic alterations in leukemia cells pose challenges for clinical diagnostics and treatment, leading to inconsistent therapeutic outcomes for the patients.24 The high rate of relapse following therapy, often leading to mortality, underscores the importance of considering relapse states in AML prognosis and disease progression.25 A deeper understanding of AML heterogeneity holds the potential to improve patient prognosis and facilitate the early adoption of novel therapeutic approaches.26

The development of sc-seq along with recent research has contributed to the characterization of AML, revealing the disease's complexity and genetic heterogeneity.25 Convergent evolution is indicated by the presence of both homogeneity and heterogeneity within AML populations shown by genes such as FMS-Like Tyrosine Kinase 3 (FLT3) and nucleophosmin 1 (NPM1).27 The determination of epigenetic heterogeneity in AML samples is challenging due to the dynamics nature of epigenetic modifications. Detection limits of sequencing further complicate the study of epigenetic heterogeneity, alongside the high-level temporal and environmental phenotypic plasticity observed in AML epigenetic alterations.28, 29

Epigenetic heterogeneity in AML also exists in differential chromatin landscapes and gene expression profiles. Loke et al.30 discovered that by comparing two AML subtypes with distinct translocations, namely t(8;21) and t(3;21), wherein the binding domain of RUNX1 is fused to the ETO regulator and the EVI1 regulator respectively, each subtype displays a unique epigenetic landscape and gene expression profile. Each type has a different transcriptional network, which may explain the different clinical outcomes for the two types.30, 31 The epigenetic modifications in AML are heritable and considerably heterogeneous, which leads to clonal evolution, increasing the probability of therapy resistance and subsequent relapse.25 Compared with other types of cancers, AML demonstrates a relatively low-level genetic heterogeneity,32 thus implying that the primary focus should be directed toward epigenetic heterogeneity though genetic heterogeneity cannot be ignored.28, 29 A comprehensive understanding that considers both genetic and epigenetic heterogeneity is necessary for characterizing the pathophysiology of AML.

2.2 Leukemia stem cells (LSCs)

The abnormal proliferation of leukemia cells in AML patients causes dysfunction of hemopoiesis, along with anemia, thrombocytopenia, and leukopenia.3, 24, 33 Within the leukemic cell population, LSCs are a rare and therapy-resistant subset believed to play a crucial role in disease progression and relapse.2, 26, 34, 35 The presence of LSCs poses significant challenges due to their capacity to evade chemotherapy treatments and immune surveillance, contributing to disease relapse after therapy, which remains a major cause of mortality. The identification of human LSCs was initially achieved by John Dick and colleagues34 in 1994 using an in vivo experimental model. Research suggests that human CD34+ leukemic cells have the ability to repopulate the bone marrow of severe combined immunodeficient mice, while CD34 leukemic blasts were not observed to be leukemogenic.34, 36 These CD34+ cells, designated as LSCs, are responsible for the initiation and maintenance of leukemia. They exhibit an enhanced ability to selectively evade chemotherapy treatments37 and immune surveillance,38 ultimately resulting in relapse after therapy, which is thought to be a major contributor to mortality. The heterogeneity observed in AML, characterized by genetic and epigenetic changes, extends to surface markers expressed on AML cells and their LSCs, posing challenges for immunologically targeting LSCs.2, 39-41 Nonetheless, the development of novel targeted therapies focusing on LSCs holds promise for reducing the risk of relapse, improving the prognosis of AML patients, and prolonging their survival rates.

2.3 Unraveling the heterogeneity of AML: insights from molecular characterization and mouse models

Recent advancements in molecular characterization technologies have propelled our understanding of the intricate heterogeneity of AML.1 Extensive investigations of genetic, epigenetic, transcriptomic, and phenotypic profiles of AML samples have illuminated the profound complexity and diversity of this disease, surpassing previous notions.1 Additionally, growing interest surrounds the exploration of proteomic and metabolomic signatures in AML and their interplay with distinct genomic profiles.42-44 Notably, within a single AML sample, subpopulations may exhibit genetic, phenotypic, and epigenetic variations, collectively contributing to the initiation, progression, and evolution of AML.1

Alongside studies on human AML samples, mouse models have emerged as valuable tools for validating the impact of diverse genetic and epigenetic abnormalities on AML development and progression, as well as investigating clonal evolution.1 Serial transplantation in mouse leukemia models has successfully recapitulated key features observed in human AML.1 Therefore, future research on clonal evolution and heterogeneity in AML should emphasize comprehensive molecular characterization of patient samples and utilization of mouse leukemia models.1 Well-characterized mouse models not only deepen our understanding of the underlying molecular pathogenesis of AML but also serve as essential preclinical platforms for evaluating novel drugs and treatment modalities.1

3 SINGLE-CELL SEQUENCING

3.1 Overview of sc-seq

Traditional sequencing methods analyze entire tissues, representing the expression of individual cells by their average values, which conceals the characteristics of each cell and tissue heterogeneity.45 This limitation is overcome by the sc-seq technology, which enables the sequencing of individual cell genomes or transcriptomes to obtain genomic, transcriptomic, or multi-omics information, offering a comprehensive understanding of cell population disparities as well as cellular evolutionary relationships.46 The field of sc-seq has witnessed significant advancements, allowing for high-resolution characterization of individual cells.

In 2009, the initial single-cell mRNA sequencing experiment was carried out, followed by the first single-cell DNA sequencing experiment in 2011.47, 48 The sc-seq technology encompasses various dimensions of cellular characterization at the single-cell level, including genomics, transcriptomics, epigenomics, proteomics, and multi-omics. As a result, it surpasses previous sequencing technologies in uncovering the complexity of the tumor microenvironment (TME), heterogeneity, cell expression, function, composition, interaction, cell lineage tracking, and mechanisms of drug resistance.45, 49 It has revolutionized our understanding of cells, providing insights into transcriptional, genomic, epigenomic, and metabolic features of individual cells, which allows for unbiased profiling of cells within neoplastic lesions.50 Furthermore, it provides molecular insights into variations in different aspects.51, 52

The capacity for sc-seq analysis has been dramatically enhanced due to the improvements in single-cell isolation, sequencing, library preparation, and algorithms.50 Single-cell transcriptomics sequencing (scRNA-seq) technologies generally follow a series of procedures, including single-cell isolation, RNA extraction, reverse transcription, preamplification, and detection.53 Overcoming the challenge of accurately identifying sequencing results at the single-cell level, single-cell barcoding technology based on plate micro-reaction systems and combinational indices has greatly increased the throughput of single-cell analysis.50 Additionally, the cost of sc-seq has been reduced, thanks to combinational index-based barcoding techniques, which allow for the recognition of cells by adding cellular barcodes in multiple rounds without isolating individual cells.54 As a result, the cost reduction facilitates the combination of other technologies with sc-seq, which has significantly enhanced the efficiency of single-cell detection.46

The sc-seq technology enables the assessment of variations in gene expression during the course of tumor progression. Comprehensive disclosure of genomic alterations, clonal architecture, and metabolic dynamics during tumorigenesis has been made possible through integrative sc-seq of adjacent normal tissues and adenomas at different stages in patients, which furnishes valuable insights into the inhibition of tumor progression.55-60 Consequently, it significantly propels our comprehension regarding the stratification of cancer diagnostics, identification of biomarkers, implementation of precise therapies, and prediction of prognoses.61-66 Moreover, it offers opportunities to identify novel diagnostic markers or therapeutic targets, thereby improving disease diagnosis and treatment.46

The sc-seq technology finds utility in various scenarios, including developmental and cancer research, as well as in the creation of human cell atlases for cell type identification and the exploration of intercellular relationships.46, 67, 68 Furthermore, it offers unprecedented opportunities to investigate the functional states of individual tumor cells. The integration of clinical pathological information and sc-seq data can lead to the discovery of pioneering potential therapeutic targets or cell types, as well as markers related to diagnosis and prognosis.69 It has been widely used for detecting differential expression genes during cancer progression. Additionally, sc-seq allows for the examination of specific transcriptional activities or highly transcribed neoantigens in patients treated with immune checkpoint inhibitors, facilitating the design of treatment strategies.50 By using scRNA-seq to explore the heterogeneity of immune-related genes, it is feasible to devise a combination of therapies that aim at different tumor neoantigens to improve clinical efficacy.50 It also holds promise in profiling specific tissues and deciphering mechanisms for patients who exhibit resistance to immunotherapy.

Despite its current performance and potential, sc-seq has limitations that cannot be ignored. Eukaryotic cells do not transcribe at a consistent basal rate, and transcriptional profiles cannot be fully deciphered by just-in-time sequencing.70, 71 Additionally, most sequencing methods are designed for 3′ or 5′ reads and are insensitive to low-abundance transcripts, leading to partial information loss.72 Moreover, the association of the genotype and phenotype cannot be accomplished with scRNA-seq alone, necessitating the development of high-throughput, economical multi-omics tools to map the overall tumor tissue landscape.50 Batch effects may arise due to different working platforms, processing procedures, and analysis performed on diverse dates, particularly when analyzing data from different sequencing studies. Furthermore, it is challenging to differentiate differences caused by individual heterogeneity, highlighting the need for larger cohorts and cautious application of existing findings in clinical trials.72 The variations among patients and the utilization of distinct platforms in different trials limit the reliability of the results. These limitations can be addressed by improving sc-seq technology and integrating it with other emerging technologies, thereby facilitating the analysis of multi-omic information at a single-cell resolution.50 Despite these limitations, sc-seq has the potential to revolutionize cancer research, enhancing the precision of molecular cancer diagnosis, biomarker discovery, and therapeutic intervention.

3.2 Single-cell sequencing for genomes (scDNA-seq)

The scDNA-seq technique enables the exploration of DNA at the single-cell level,73 ensuring that genomic signals present in only a few cells are not overlooked. Similar to how the voices of individuals can get drowned in a large group, genomic signals in single or small numbers of cells may go undetected if single-cell genomes are not analyzed.73 To obtain high-quality scDNA-seq data, four key technical challenges need to be addressed, namely the precise physical isolation of single cells, efficient amplification of isolated cells to acquire sufficient material, accurate identification of target variants through genome querying, and meticulous interpretation of data while considering biases and errors introduced during the previous steps. For single-cell isolation, various methods can be employed, each with different levels of accuracy, throughput, reproducibility, and usability.74, 75

Whole-genome amplification (WGA) approaches, such as degenerate oligonucleotide-primed PCR,76, 77 multiple displacement amplification,78, 79 multiple annealing and looping-based amplification cycles, and PicoPLEX,80, 81 can be utilized to amplify DNA before sequencing. Microfluidic devices have shown promise in reducing contamination during single-cell WGA,82 and a novel two-step microfluidic droplet program has been proposed to realize efficient and large-scale parallel barcoding based on single-cell PCR.83 The choice of genomic interrogation, such as whole-genome sequencing, whole-exome sequencing, or target sequencing, depends on the study's objectives. These approaches allow for tracing the mutational history of driver genes and obtaining cellular-level mutation information, including deleterious and pathogenic mutations, in both coding and noncoding regions of interest.84 However, the high cost of scDNA-seq currently limits its application for high-dimensional analysis. A more cost-effective approach involves performing bulk sequencing initially, followed by scDNA-seq specifically targeting the desired mutations or variations.

3.3 Single-cell sequencing for transcriptomes (scRNA-seq)

The scRNA-seq technique has emerged as a powerful tool since the first experiment conducted in 2009, which provides comprehensive transcriptome information at the single-cell level, enabling the identification of mutations in transcribed coding regions of genes.24 Thus, it has been extensively employed to explore the transcriptomes of individual cells and is particularly valuable for temporal lineage tracing and spatial transcriptome analysis.15 This technology allows for the investigation of the environment and dynamic interactions within individual cells, which are crucial factors influencing tumor heterogeneity and stress reactions.50 In the realm of cancer research, scRNA-seq and its derivatives play a significant role in diverse aspects including identifying rare subpopulations, cancer stem cells, circulating tumor cells, and novel markers as well as investigating the microenvironment, heterogeneity, and evolution patterns.85-92 Also, they help unravel mechanisms related to tumor initiation, progression, metastasis, evolution, recurrence, and treatment resistance.93

The workflow of scRNA-seq involves several fundamental steps, including single-cell isolation, RNA molecule capture, reverse transcription, cDNA amplification, library preparation, sequencing, and data analysis shown in Figure 1.83 The isolation of single cells from tissue is the initial and critical step, typically achieved through enzymatic digestion or mechanical dissociation.94-96 In the case of frozen tissue, single-nucleus RNA sequencing (snRNA-seq) is preferred due to the intact nuclear membrane and simpler preparation compared to a single-cell suspension.97 snRNA-seq provides more reliable transcriptome data as genetic material in the nucleus is more stable and less prone to artificial stress responses or transcriptional bias.98

Details are in the caption following the image
A general workflow of single-cell RNA sequencing experiments. For single-cell RNA sequencing, the workflow generally includes the following steps: (1) single-cell dissociation and single-cell isolation; (2) RNA molecule release and capture after the cell lysis; (3) reverse transcription of RNA into complementary DNA (cDNA) with barcodes and unique molecular identifier (UMI); (4) cDNA amplification by PCR or by transcription in vitro; (5) preparation and pooling of sequencing libraries; (6) sequencing the libraries using the next generation sequencing (NGS); (7) using bioinformatic tools to assess, analyze, and visualize the data. FACS, fluorescence-activated cell sorting; LCM, laser capture microdissection; MACS, magnet-activated cell sorting.

Different methods for single-cell isolation have advantages and disadvantages, necessitating careful selection based on specific research purposes. In certain cases, scRNA-seq can be combined with other techniques to enhance its capabilities. For example, the combination of scRNA-seq with CRISPR screening enables high-throughput functional profiling of regulatory mechanisms and heterogeneous cell populations.99 Additionally, the integration of scRNA-seq with CRISPRi contributes to the investigation of regulatory elements and their interrelationship with genes.100 Moreover, the utilization of snRNA-seq with microfluidic technology offers an economical, highly sensitive, and efficient cell classification method, which holds promise for human cell mapping projects.97 Finally, the integration of single-core sequencing and single-cell transposon hypersensitive site sequencing provides a high-throughput platform for the simultaneous detection of nuclear transcripts and epigenetic traits, enabling comprehensive analysis of gene expression and regulation in cryopreserved human tissue samples.101

4 LEVERAGING SC-SEQ DATA FOR AML STUDY

Despite significant progress in understanding the molecular pathology of AML in recent decades, the overall and relapse-free survival rates remain low among AML patients.1 The AML research has focused on identifying genetically heterogeneous tumor cell populations.83 Accumulating evidence indicates that AML usually has a highly complicated clonal architecture and individual leukemias are composed of genetically, phenotypically, and epigenetically distinct clones, which have both shared and divergent somatic mutations and are continually evolving.27, 32, 102, 103 This further increases the complexity of the diseases. The situation is relatively uncommon and sometimes undetectable by conventional methods. The sc-seq technique can provide microscopic insights and bring a wealth of data. Traditional sequencing can only obtain the average of a large number of cells, but cannot analyze a small number of cells, losing cellular heterogeneity information.46 Nevertheless, sc-seq confers significantly greater benefits in terms of detecting heterogeneity among individual cells,104 distinguishing a small number of cells, and delineating cell maps.46 Sc-seq techniques have greatly changed the study of malignant diseases and advanced the understanding of cancer biology, which encompasses areas such as clonal evolution, transformation, adaptative selection as well as treatment resistance of leukemic cells.105 Undoubtedly, standard bulk sequencing has led to remarkable progress in cell population characterization and tumor therapy.83 However, it often failed to recognize rare alleles, or definitely determine the co-occurrence of mutations within the same cells, thereby necessitating single-cell resolution which can prove critical.83 The sc-seq technique provides a means at single-cell resolution to dissect intratumoral genetic and epigenetic heterogeneity, and then identify clones that accumulate resistance factors associated with chemo/immunotherapy, ultimately affecting prognosis and treatment outcomes.106 Given that AML exhibits molecular heterogeneity, single-cell techniques offer a potent means to gain critical insights into leukemia initiation, evolution, and recurrence.107 Also, some information is provided by sc-seq techniques on the genetic landscape,25 subclonal structures, regulatory networks, gene expression, and proteomic profile.50 Dissecting cellular heterogeneity is a key application of scDNA-seq/scRNA-seq.83 It investigates the genomic and transcriptomic profiles of distinct cell subpopulations and evaluates the similarities and differences that cannot be detected via bulk DNA and RNA sequencing.83

4.1 Studies on the pathogenesis of AML

Some cells co-express genes pertaining to myeloid lineage initiation as well as stemness, and they are found to be abundant in genetic abnormalities like FLT3-ITD mutations.88 Furthermore, the primitive nature of these cells potentially exacerbates their oncogenic potential. With scRNA-seq, P. van Galen et al.88 identified primordial AML cells as prognostic markers displaying abnormal transcriptional programs and co-expressing genes associated with stemness as well as myeloid initiation, and these primitive cells were found to be abundant in genetic abnormalities (e.g., FLT3-ITD). Chen et al.4 indicated that HOXA3-10 genes may serve as possible therapeutic targets and prognostic markers of AML since the upregulation of these genes may contribute to the initiation of AML and their existence may serve as an indicator of adverse prognosis. Ye et al.108 investigated the relationship between fatty acid metabolism (FAM), TME, as well as the prognosis of AML patients and performed a functional enrichment assay to assess the significance of FAM in the immunosurveillance against AML. Their scRNA-seq analysis revealed that the levels of FAM-related genes were elevated in the population rich in LSCs, and these genes were then utilized to develop a prognostic model capable of accurately predicting the outcome of AML patients and changes in the immunosurveillance based on TME.108 Besides, PLA2G4A was identified as a gene associated with the high expression of FAM in AML patients with unfavorable prognoses, and they demonstrated that pharmaceutical targeting of PLA2G4A increases NKG2DL expression in leukemia cells in vitro and inhibits FAM, thereby enhancing NK cell-mediated immunosurveillance in leukemia cells.108 Regarding pediatric AML patients, WT1 mutations and NUP98 rearrangement, which activates alterations of FLT3, contribute to poor prognosis.109 DNA methyltransferase 3 alpha (DNMT3A) mutations, which are associated with reduced sensitivity to anthracyclines and represent early clonal events in adult AML, are rare in childhood.110 It is anticipated that the utilization of sc-seq for the co-detection and relative quantification of tumor genetic signatures in blast cells will yield more precise and novel prognostic insights into patient stratification.111, 112 In addition, the potential importance of the chemokine receptor gene CCR1 and one of its ligands CCL23 was highlighted,113 and the resistant cell lines could be selectively targeted by inhibiting squalene synthase, providing a new and promising strategy to directly inhibit cholesterol synthesis in drug-resistant AML cells.114

4.2 Sc-seq studies on AML heterogeneity

Zhai et al.115 utilized scRNA-seq to explore the clonal of diagnosis and relapse pairs at genetic and transcriptional levels and revealed the underlying pathways and genes leading to recurrence, suggesting alternative mechanisms leading to therapeutic resistance and AML recurrence. Clonal evolution in the stem cell compartment is nonlinear during myelodysplastic syndromes initiation and progression to AML, generating dominants clone as well as sub-clones, with a reduced number of clones detectable in the blast compartment.116 Sc-seq studies also revealed marked changes in the clonal structure in AML following the acquisition of mutations during disease evolution and in response to therapeutic pressures, indicating that the clonal structure of AML is highly dynamic and facilitates evasion of treatment escape.117 Even in the setting of FLT3-targeted and isocitrate deshydrogenase-1 (IDH1)-targeted therapy with inhibitors that inhibit the expansion of mutant clones, RAS pathway mutations play an important role in the relapse in FLT3 and IDH1 mutant and nonmutant clones.117 With a high-throughput scDNA-seq platform, Morita et al.118 provide a detailed and comprehensive depiction of AML clonal architecture. In this study, the data on mutation co-occurrence and mutual exclusion at the cellular level validated the clonal relationships among AML driver mutations deduced in the previous bulk-sequencing studies, and suggested new findings, such as previous mischaracterization between TP53 and PPM1D.119, 120 Mutational history reconstruction on the basis of single-cell data revealed the existence of linear and branched evolution patterns in AML, along with convergent evolution in some instances.118 Furthermore, the xenotransplantation of various AML samples, involving the case exhibiting convergent evolution, demonstrated the ability of multiple parallel subclones to initiate leukemia.118 Emerging single-cell multi-omics technologies enable simultaneously profiling mutations and cell surface proteins in AML samples, allowing for exploring how genetic and phenotypic heterogeneity in AML may be linked and advancing our understanding of how mutation history supports the phenotypic alterations during clonal evolution.118 The data in the study show the clonal diversity and evolution patterns of AML and highlight their clinical relevance.

4.3 Sc-seq studies on LSCs

Approximately 30%–40% of AML patients are refractory to initial therapy or die from recurrence, and induction chemotherapy failure in AML is primarily driven by resistant LSCs.121 The lack of general surface markers for the identification and isolation of AML LSCs poses a significant challenge.121 Thus, ongoing research aims to discover novel markers to characterize LSCs as well as facilitate the development of anti-LSC therapies, and technological advancements, including high-throughput bulk cell sequencing to high-dimensional single cell analysis, have been instrumental in unraveling the cellular hierarchies and dysregulated transcriptional networks in AML.121 The properties of LSCs and the interaction with the extrinsic bone marrow microenvironment create a favorable environment for leukemogenesis by secreting various cytokines, chemokines, and growth factors that protect LSCs against conventional chemotherapy.121 Intratumoural delivery approaches that focus on immune-mediated eradication by inducing microenvironmental alterations within the tumor as well as avoiding systemic toxicity are emerging and a promising healing approach for AML may be selective targeting of LSCs and their protective bone marrow niche.121 Besides, to improve remission and survival rates as well as decrease relapse in patients with AML, fresh anti-LSC therapies are being explored to address chemoresistance as well as immune escape and reduce toxicity as well as sustained delivery. The scRNA-seq is used to reveal distinct transcriptional profiles in LSCs to investigate whether proliferation and self-renewal are independent functions in LSCs.122 It has been demonstrated that CD69high LSCs were capable of self-renewal but poorly proliferative, while CD36high LSCs showed higher proliferation but cannot transplant leukemia.122 The results suggest that self-renewal and proliferation exist in different subsets within the AML LSC compartment and the self-renewal gene profile of LSCs was functionally validated at a single-cell level.123 Carsten et al.124 used scRNA-seq to demonstrate that LSCs upregulated CD70 in response to treatment with hypomethylating agents, and targeting CD70 with citatuzumab could eliminate AML stem cells. Stetson et al.125 demonstrated the presence of RNA-based alterations in LSCs in diagnostic and relapse samples in a longitudinal study, showing that RNA clonal evolution is analogous to that of DNA in the progression of AML. Some common signaling networks, such as apoptosis, chemokine signaling, and metabolism, evolve during AML progression and become hallmarks of relapsed samples.125 This information can guide the development of effective therapies targeting LSCs to improve remission and survival rates and decrease relapse in AML patients.

5 IMPACTS OF DATA AND SC-SEQ ON AML TREATMENT AND PROGNOSIS

5.1 Therapy strategies in AML

Treatment efficacy and tolerability in AML have been shown to deteriorate significantly with age7 and traditional chemotherapy for AML can be highly toxic, often requiring prolonged hospitalization.126 Common treatment strategies have been summarized in Table 1. It was suggested that individualized treatment for patients over 60 years old could base on physical status, cytogenetic or molecular mutations, and comorbidities, rather than relying solely on age.127 The treatment of AML has developed in recent years, with new targeted drugs such as midostaurin and gilteritinib targeting FLT3, as well as ivosidenib and enasidenib targeting mutant isocitrate dehydrogenase 1 and 2 shown in Figure 2.128 The best responses to treatment might be seen when these agents are combined with conventional chemotherapy.129

Table 1. Therapy strategies for patients with acute myeloid leukemia.
Population Cases Therapy strategy Post-remission management References
Young adults (<60) Common option

“7 + 3” regimen:

Cytarabine for 7 days in conjunction with short infusions of an anthracycline drug such as daunorubicin or idarubicin on each of the initial 3 days

Three or four cycles of high-dose cytarabine/autologous hematopoietic cell transplantation (HCT)/allogeneic HST [130, 131]
With FLT3 gene mutation Plus midostaurin/gilteritinib [132, 133]
With CD33 protein Plus gemtuzumab ozogamicin [134]
With poor heart function Another chemo drug (e.g., etoposide) instead of anthracyclines [135]
Medically unfit or older adults Common options Azacytidine, low-dose cytarabine (LoDAC), or decitabine; or LoDAC plus a targeted drug such as venetoclax, clofarabine, or glasdegib. Maintenance therapy instead of consolidation therapy: Proper-dose hypomethylating agents. [136]
With IDH1 gene mutation Plus ivosidenib [128]
With IDH2 gene mutation Plus enasidenib [128]
With FLT3 gene mutation Plus midostaurin/gilteritinib [132, 133]
With CD33 protein Plus gemtuzumab ozogamicin [134]
Details are in the caption following the image
Strategies for targeted therapy in acute myeloid leukemia. There have been various treatments proposed for AML. (A) Tyrosine kinases inhibitors are provided, such as type 1 FLT3 inhibitors: gilteritinib, crenolanib, midostaurin; type 2 FLT3 inhibitors: sorafenib, quizartinib; vascular and endothelial growth factor receptor (VEGFR) inhibitor cediranib; Janus Kinase 2 (JAK2) inhibitor ruxolitinib; mammalian target of rapamycin (mTOR) inhibitor everolimus; TEC family kinases inhibitor ibrutinib. (B) Tagraxofusp is a CD123-directed medication, and IMGN632 can attach to CD123 for the transportation of chemotherapy. When using Mylotarg (gemtuzumab ozogamicin), gemtuzumab targets CD33 to deliver ozogamicin, and SGN-CD33A is another CD33-targeted antibody-drug conjugate. (C) Inhibitors of serine/threonine kinases and DNA methylation regulators: Mitogen-activated protein kinases (MAPK) as serine/threonine kinases have inhibitors like vemurafenib. IDH1 and IDH2 as well as DNMT3A mutations can lead to the deregulation of DNA methylation. Ivosidenib, enasidenib, and DNMT3A-IN-1 act on them separately. (D) Common targets in AML treatment: Aberrations in TP53 can engender transcriptional deregulation as well as impaired degradation but TP53 has its inhibitor elesclomol correspondingly. Mutations within the NPM1 gene are linked to anomalous cytoplasmic localization of NPM1 proteins. Selinexor has been identified as a potent inhibitor of the XPO1, which then curtails the leukemic activity of mutated NPM1 proteins. In addition, magrolimab is an inhibitor of the macrophage immune checkpoint CD47, which exhibits elevated expression levels within leukemia stem cells and is associated with the FLT3-ITD mutation. Venetoclax functions as an inhibitor of the B-cell lymphoma 2 (BCL2) protein, thereby promoting apoptosis of cells relying on BCL2. (E) For mutations of epigenetic modifiers DOT1-like histone lysine methyltransferase (DOT1L) and enhancer of zeste homolog 2 (EZH2), pinometostat and DS-3201b serve as their inhibitors separately.

Moreover, recent studies have provided fresh information for potential AML therapeutic targets. Pathologically analogous tumors frequently exhibit varying responses to identical drug regimens, underscoring the imperative for techniques to better align patients with optimal drugs.137 The genome-wide expression data and in vitro drug sensitivity data from cancer cell lines are increasing and they have contributed to a data-driven approach that identifies markers by discovering robust statistical associations between genes and drugs.137 Through experiments, Lee et al.137 confirmed SMARCA4 as a molecular marker and driving factor for the sensitivity to topoisomerase II inhibitors (mitoxantrone and etoposide). The identification of a mitoxantrone response predictor based on clinically available biological samples, such as gene expression of leukemic blasts measured before treatment, has the potential to augment the median survival rate among patients with elevated SMARCA4 expression while providing alternative therapeutic options for individuals with low SMARCA4 expression.137 In the study of Simonetti et al. in 2021,42 they accurately differentiated AML from patients and predicted changes in NAD as well as purine metabolism in NPM1/cohesion-mut AML that suggest potential vulnerabilities, which deserve to be explored in terms of treatment. They offered an overview of the crosstalk between metabolic pathways and between genomics and metabolomics in AML, thereby highlighting functional interactions and dependencies that could be harnessed for therapeutic purposes.42 Lin et al.138 confirmed that Glutamate-cysteine ligase catalytic (GCLC) is essential for cell growth, survival, clonogenicity, and leukemogenesis of AML cells but not for normal hematopoietic stem and progenitor cells (HSPCs), indicating that GCLC is a potential therapeutic target for AML. Köhnke et al.139 demonstrated a surfaceome detection method to investigate the whole AML surfaceome directly from raw patient samples and integrate these data with gene expression and mutational burden data, allowing for unbiased, genome-wide screening for target discovery in AML immunotherapy. Besides, the study of Ravasio et al.140 supported the use of LSD1i in combination therapy. In the study of Tyner et al.,141 some markers and mechanisms of drug sensitivity and resistance were revealed for future study. The explosive emergence of related studies highlights the urgent necessity to pinpoint predictive biomarkers that are capable of telling the most crucial targets in AML.

When we can identify, differentiate, and even predict disease progression, we make more accurate treatment decisions. Nicora et al.142 developed decision support tools that allow for customized therapeutic interventions based on a precise stratification of patients' risk. Wang et al.143 demonstrated the feasibility of drug screening-guided treatment for children with high-risk AML. They formally established the first pediatric AML-specific drug response profile, discovered new treatment loopholes through in-depth integrative analysis with genomic, transcriptomic, and medical data, and realized evidence-based functional precision medicine in children.143

5.2 Sc-seq for AML treatment

Single-cell transcriptomic and proteomic analyses have been shown to offer a mechanistic comprehension of the molecular basis of chimeric antigen receptor (CAR), cytokine-induced memory-like (CIML) NK cells.144 Dong et al.144 demonstrated the practicality and potential of equipping CIML NK cells with tumor-specific CARs and additional effector molecules for adoptive cell therapies. With scRNA-seq, Guo et al.145 identified several immune cell types present in AML patients and they found that exhausted conventional T cells and immunosuppressive T cells can serve as targets of anti-CTLA4, anti-PD1, and anti-CD25 therapies. They found an extensive diversity of monocytes/macrophages and dendritic cells in the mature myeloid lineages, and targeting a single or small subset of them does not appear to be effective for most AML patients.145

The research conducted by Wu et al.146 put forward CCNA1 and RAB37 as novel drug targets which exhibit high expression levels solely among AML progenitor cell clusters. In the study of Malani et al.,147 a functional precision medicine tumor board (FPMTB) utilized different data including sc-RNA seq data to be employed in making clinical treatment decisions. The integration of different data across a broad spectrum of patients will facilitate continuous improvement of the FPMTB recommendations, providing a framework for personalized implementation of functional precision cancer medicine.147 A systematic data-driven strategy that integrates all available analytical data has the potential to enhance drug response predictions through continual refinement.147

5.3 Advance in AML prognosis study

Despite the approval of new agents for various indications in AML since 2017, the 7 + 3 regimen remains the primary induction chemotherapy.148 However, due to the heterogeneity of AML patients, there is significant interindividual variation in response to this standard therapy, with approximately 30% of patients not responding to this regimen.149 In the case of utilizing scRNA-seq, it has become possible to investigate the interindividual and intraindividual complexity in AML. Access to pertinent molecular or genetic data has the potential to significantly improve prognostic assessment for AML. Using scRNA-seq, Zhang et al.150 found that abnormally expressed B7 family molecules affected the prognosis of AML patients, which means they may serve as promising prognostic biomarkers and candidate therapeutic targets. With scRNA-seq, Jia et al.151 established a landscape for AML CD34+ cells and identified HSPC types based on the lineage signature genes. By comparing sensitive AML patients with those who are resistant, they found that cell populations with CRIP1highLGALS1highS100Ashigh exhibiting the features of granulocyte–monocyte progenitors were associated with adverse prognosis of AML. Moreover, two cell populations marked by CD34+CD52+ or CD34+CD74+DAP12+ were associated with good response to induction therapy.151

Some studies provided an opportunity for prognosis prediction in patients with AML. Lu et al.152 identified various immune subtypes present in AML and built a model with nine prognostic biomarkers to predict the prognosis of patients with diverse immune cell infiltration clusters. In the meantime, scRNA-seq was utilized to uncover the differentiation trajectory of cells in the bone marrow microenvironment and the expression of prognostic immune genes, which provide a way to forecast the survival and prognosis of AML patients and may point out potential targets for immunotherapy.152 In addition, in the prognostic model of Ding et al.,20 they used the expressions of five immune genes (MIF, DEF6, OSM, MPO, AVPR1B) to stratify and predict the treatment outcome of non-M3 AML patients. With scRNA-seq, Dai et al.153 developed a prognostic model for AML among adults based on the cell type compositions, termed CTC score. Besides comparable performance to the previous two prognostic scores based on gene expression: the 17-gene leukemic stem cell score154 and the AML prognostic score,155 CTC score also provided independent and additional prognostic information different from that provided by them.153 The CTC score has the potential to help clinicians develop customized treatment plans. Also, the prognostic model of Ma et al.11 accurately predicted the overall and disease-free survival of adult AML patients and it is able to inform treatment decisions using easily accessible data in daily clinical routine. Bai et al.156 investigated the prognostic value of mRNA expression of the MAP4K family and its certain expression could forecast favorable overall survival in AML patients. Shouval et al.157 developed a prognostic model for estimating the probability of leukemia-free survival (LFS) after ASCT in patients with AML in first complete remission undergoing transplantation. Age and leukemia risk are predictors of long-term LFS, and they were integrated into a nomogram that can be used to estimate outcomes after transplantation.157 The developed nomogram can be used for patient counseling, risk stratification, statistical analysis, and the potential planning of interventions.157 Overall, the emerging technologies of big data and sc-seq are generating vast amounts of information into the molecular landscape of AML and advancing our understanding of the mechanisms underlying AML pathogenesis, diagnosis, and therapeutic responses as seen in Table 2.

Table 2. Molecules/genes associated with the treatment or prognosis of acute myeloid leukemia.
Molecules or genes Roles References
Therapy-related NAD and purine They both can regulate immune function and cytokine release and targeting NAD metabolism could restore the myeloid differentiation program in leukemic cells with NPM1/cohesin mutations. [42]
CD70 Leukemia stem cells upregulated CD70 in response to treatment with hypomethylating agents, and targeting CD70 with citatuzumab could eliminate acute myeloid leukemia stem cells. [124]
Glutamate-cysteine ligase catalytic It is essential for cell growth, survival, clonogenicity, and leukemogenesis of AML cells but not for normal HSPCs. [138]
Tumor-specific CARs and other effector molecules They can be used to arm CIML NK cells for cancer adoptive cell therapies. [144]
Exhausted conventional T cells and immunosuppressive T cells They are the targets of anti-CTLA4, anti-PD1, and anti-CD25 therapies. [145]
CCNA1 and RAB37 They are highly expressed in AML progenitor cell clusters rather than other tissues. [146]
Prognosis-related HOXA3-10 genes Elevated HOXA3-10 gene expression may be linked to AML development and could potentially function as a marker to identify patients with unfavorable prognoses. [4]
MIF, DEF6, OSM, MPO, AVPR1B The expressions of them can be used in the prognostic model to stratify and predict the treatment outcome of non-M3 AML patients. [20]
NUP98, FLT3, and WT1 In pediatric AML, NUP98 rearrangement, special activating alterations of FLT3, and WT1 mutations contribute to poor prognosis. [109]
B7 family molecules Abnormally expressed B7 family molecules can affect the prognosis of AML patients. [150]
CRIP1+highLGALS1highS100Ashigh A cell population with CRIP1highLGALS1highS100Ashigh exhibiting the features of granulocyte-monocyte progenitors is associated with an adverse prognosis of AML. [151]
CD34+CD52+ and CD34+CD74+DAP12+ Cell populations characterized by CD34+CD52+ or CD34+CD74+DAP12+ are associated with favorable responses to induction therapy. [151]
MAP4K3, MAP4K4, MAP4K1 High expression of MAP4K3, MAP4K4, and MAP4K5 combined with low-level expression of MAP4K1 could be employed to predict favorable overall survival among patients with AML. [156]

6 PUBLIC RESOURCES FOR AML STUDIES

In AML research, clinical information can be obtained from hospitals11, 113, 142, 158, 159 or government agencies (e.g., the Banque de Cellules Leucémiques du Québec,113 the Acute Leukemia Working Party of the European Society for Blood and Marrow Transplantation157), while genetic data was more obtained from public databases. For pharmaceutical research, there are also corresponding databases such as Therapeutic Target Database (TTD),160 Resistant Cancer Cell Line (RCCL),114 and DGIdb.161 Besides, data can also come from the institution to which the researchers are affiliated.146 Papaemmanuil et al.'s149 study on AML deserves special attention in published studies since it has been used as a data source for several subsequent studies. The summary of some public databases related to AML can be seen in Table 3.

Table 3. Public databases and website for AML research.
Database AML Sequence DNA RNA Protein Image Clinical Websites
Beat AML https://registry.opendata.aws/beataml
GTEx https://gtexportal.org/home/
GEO https://www.ncbi.nlm.nih.gov/geo/
dbGaP https://www.ncbi.nlm.nih.gov/gap/
UCSC Xena https://xena.ucsc.edu/
EGA https://ega-archive.org/
GDC https://gdc.cancer.gov/
TCIA https://www.cancerimagingarchive.net
TCGA https://www.cancer.gov/ccg/research/genome-sequencing/tcga
DepMap https://depmap.org/portal/home/#/
DisGeNET https://www.disgenet.org
BloodSpot https://servers.binf.ku.dk/bloodspot/
TTD https://db.idrblab.net/ttd/
DGIdb https://www.dgidb.org
  • Abbreviation: AML, acute myeloid leukemia.

The experimental results sometimes cannot directly provide the answers to many important and complex biological questions today, and downstream bioinformatics analysis involved data integration plays a critical role in uncovering potential answers.162 Efficiently integrating large, heterogeneous biological datasets can facilitate better and more comprehensive biological research inferences across multiple fields. For omics data, mRNA, DNA, epigenome, and protein can be analyzed in single cells, resulting in only one type of single-cell omics data; however, the advent of single-cell multi-omics brings more comprehensive coverage of biological features.163 Incorporating data from multiple aspects could enhance the accuracy of identifying cell clusters,164 cellular trajectories,165 new cell subpopulations,166 and lineage tracing.163 The multi-omics data facilitate the exploration of complicated interactions and linkages in tumors that are in different states and display distinct phenotypes, and it also provides detailed information regarding the sequential stages of tumorigenesis, encompassing initiation, progression, growth, immune evasion, metastasis, recurrence and resistance to treatment.50 Furthermore, multi-omics could reveal novel diagnostic biomarkers and point out therapeutic targets, thereby leading to the adaptation of current approaches for diagnosis and treatment.167 Enhancing the precision and sensitivity of emerging techniques and computational analytics is imperative while reducing the cost is also needed for wider accessibility in the future.50 Inevitably, we also need to use clinical and demographic information to obtain information about the patient's specific physical manifestations and to investigate external factors, as seen in the studies of Ma et al.,11 Nicora et al.,142 Wang et al.,143 and Shouval et al.157 The studies of Pölönen et al.,168 Malani et al.,147 Wang, et al.,143 and Chen et al.4 showed that the inclusion of molecular profiles and drug profiling data in data integration is also a common demand in AML research. Besides, the integration of gene expression data and other relevant information can also yield valuable discoveries.113, 139, 159, 169

After integration, various analysis tools can be used to perform the analysis easier and more conveniently. Some AML studies utilized tools such as cBioPortal, Metascape, LinkedOmics,150 and GEPIA,11 while others require customized models and corresponding statistical or bioinformatics techniques to meet specific research needs. The Cox proportional hazards model is a very popular choice in analysis and has been implemented in many AML studies. Besides, the neural network model and the Lasso regression model are commonly used, performing well in some situations.20, 170 The Markov model approach used in the study of Nicora et al.142 is a good method to be considered to describe the patients' evolution across stages. Nonnegative matrix factorization and Bayesian models assist in unambiguously assessing the depth of heterogeneity in the TME.171 The R package Seurat is designed for scRNA-seq data, which enables users to complete quality control, identify and interpret sources of heterogeneity from single-cell transcriptomic measurements, and perform the analysis while integrating multiple types of single-cell data is also possible.172 The common scRNA-seq data analysis workflow is illustrated in Figure 3. For comparison, various statistical tests have been used in studies. Fisher's exact test and Chi-square test can be used to compare categorical variables; the Student's t-test, ANOVA, and the Wilcoxon rank sum test are for continuous data; the log-rank test can be used to compare the difference between disease-free survival and overall survival distribution.11

Details are in the caption following the image
A general workflow of single-cell RNA sequencing data analysis. The sequencing data is first processed and aligned to obtain a count matrix for all genes and all cells of an individual cell. Then it is necessary to carry out quality control and feature selection according to the actual situation. After dimensionality reduction and clustering, it is possible to identify and annotate the clusters based on the results from differential gene expression. Moreover, trajectory inference can be performed and gene expression dynamics can be explored.

7 CONCLUSION AND PERSPECTIVE

In conclusion, the use of big data and sc-seq in the research of AML presents significant opportunities for improving diagnosis, treatment, and patient outcomes while there are some problems. Strong analytics performance will be required for real-time clinical questions (e.g., intensive care alerting) and for supporting research computational demands (e.g., omics data analysis). When embracing data, their expected heterogeneity and poor quality have to be overcome for accuracy, while a paradigm shift towards data-driven analytics needs to be adopted.12 From a statistical standpoint, a problem is that the sample size is sometimes much smaller than the number of variables present in some genomic data. When using omics data, aligning sequences is very difficult to achieve and may introduce more noises, which get propagated as more inferences are drawn from them. Beyond that, there are some special issues in AML research. Registry-based data on diagnosis, treatment received, and clinical outcomes present notable limitations as it is recognized that in some registry data such as the Surveillance Epidemiology and End Results (SEER)-Medicare database, up to 50% of AML diagnoses were underreported.173 Real-world data on the clinical outcomes of AML patients are limited. When such data are available, the observations and conclusions drawn from them may appear inconsistent with those derived from published clinical trial data.174 This situation needs further improvement to reduce the impact on some AML studies involving such data.

Sc-seq techniques are with an enormous impact, offering unique opportunities to reveal the heterogeneity in tumors, identify uncommon cells, and follow the evolution of clones.83 Profiling intratumoral genetic and epigenetic heterogeneity at single-cell resolution has the potential to uncover clones that amass resistance factors against chemotherapy or immunotherapy, modulating prognosis and response to treatment.106 Revealing the unique features of single cells, including the proliferation, self-renewal, and resistance mechanisms, may assist in improving the identification of the targeted cluster.83 There is no doubt that in the AML context, sc-seq experiments generate a significantly larger amount of biological information compared to other commonly used single-cell methods such as karyotyping, immunophenotyping, in situ hybridization, flow cytometry, and mass cytometry.107 Initial studies using scRNA-seq focused on distinguishing AML and its bone marrow microenvironment by exploring signatures of stemness, developmental hierarchies, and interactions between malignant and immune cells.121 The scDNA-seq and scRNA-seq techniques allow for the analysis of profiles in diverse cell subpopulations that cannot be detected by bulk sequencing.83 This is particularly useful in studying AML, as it is known for its significant heterogeneity. Ultimately, the advancements in sc-seq techniques will continue to contribute significantly to our understanding of AML biology and improve patient outcomes.

The use of various biological data has become popular in AML research with the emergence and improvement of technologies.175, 176 They provide valuable information on AML itself, treatment, prognosis, and so forth. Among various data acquisition methods, the sc-seq technique is especially attractive and deserves special attention. It provides high-throughput data that can help researchers explore aspects of AML that cannot be reached by other means. However, insufficient coverage, access to full-length RNA sequences, RNA modifications, imperfect algorithms for mutation detection, data normalization, differential gene expression analysis, dimensionality reduction, and mutational heterogeneity of blast cells are still challenges to some extent.177 While there is still room for improvement, sc-seq has proven its uniqueness and values in many studies, which demonstrate that it has great potential for the future and deserve further optimization. There is still much to discover and explore, and continued use and optimization of new technologies will move us closer to finding the optimal solutions for AML.

AUTHOR CONTRIBUTIONS

Yuxuan Zou, Hongbo Hu, and Huiyuan Zhang conceived the review. Yuxuan Zou, Hongbo Hu, and Huiyuan Zhang drafted the manuscript. All authors read and approved the final manuscript.

ACKNOWLEDGMENTS

Figure 1 is produced at https://www.diagrams.net/. The normal-cell-2 icon, rna icon, arrow-right-short icon, and arrow-down icon by Servier (https://smart.servier.com/) are licensed under CC-BY 3.0 Unported (https://creativecommons.org/licenses/by/3.0/), and sequence_histogram icon is licensed by CC0 (https://creativecommons.org/publicdomain/zero/1.0/). Other icons are open-access or produced by ourselves. Figure 2 is produced at https://www.diagrams.net/. All icons are open-access or produced by ourselves. Figure 3 is produced at https://www.diagrams.net/. The histogram icon is licensed under MIT (https://mit-license.org/). Other icons are open-access or produced by ourselves. This study was supported by grants from sponsored by Chongqing International Institute for Immunology (2020YJC01), the Ministry of Science and Technology (the National Key Research and Development Program 2019YFA0110200) and National Natural Science Foundation of China (82025002, 32230036 and 31870881), the 1.3.5 Project of disciplines of excellence (ZYYC20012) and National Clinical Research Center for Geriatrics (Z20201001), West China Hospital, Research Project of Sichuan Provincial Health Commission (16PJ334).

    CONFLICT OF INTEREST STATEMENT

    The authors declare no conflict of interest.

    ETHICS STATEMENT

    This study did not involve human participants and/or animals or informed consent. Thus, ethical clearance is not applicable to this article.

    DATA AVAILABILITY STATEMENT

    No data sets were generated or analyzed during the current study. Thus, data sharing is not applicable to this article.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.