Understand SLE heterogeneity in the era of omics, big data, and artificial intelligence
Abstract
Systemic lupus erythematosus (SLE) is a systemic autoimmune disease characterized by extraordinary heterogeneity, due to the complex pathogenesis and diverse manifestations. Stratification of patients for therapy and prognosis represents a major challenge to manage SLE. Conventional biomarkers for disease diagnosis and activity assessment provide very limited insight into immunological pathogenesis and therapeutic response rates. The advancement of “omics” technologies including genomics, transcriptomics, proteomics, and metabolomics has constituted an unprecedented opportunity to characterize the immunopathological landscape in individual patients with SLE. Indeed, genomic studies reveal a subset of SLE patients carrying one or more functional single nucleotide polymorphisms (SNPs) underlying immune dysregulation while transcriptomic studies have revealed subgroups in SLE patients showing distinct signatures for Type I interferon (TI-IFN) pathway activation or aberrant differentiation of B cells into plasma cells. This review will summarize results from the latest studies using omics technology to understand SLE heterogeneity. In addition, we propose that the application of artificial intelligence, such as by machine learning-based nonlinear dimensionality reduction method uniform manifold approximation and projection (UMAP) can further strengthen the analysis of omics big data. The combination of new technology and novel analysis pipeline can lead to breakthroughs in stratifying SLE patients for a better monitoring of disease activity and more precise design of treatment regime, not only for conventional immunosuppression but also novel immunotherapies targeting B-cell activating factor (BAFF), TI-IFN, and interleukin 2 (IL-2).
1 SLE IS CHARACTERIZED BY EXTRAORDINARY CLINICAL DIVERSITY AND DRIVEN BY MULTIPLE PATHOGENIC PATHWAYS
Systemic lupus erythematosus (SLE, or lupus) is the archetypal multisystem autoimmune disease affecting 20–150 people per 100,000, with female predominance (7-9:1).1 It is characterized by extraordinary clinical diversity, wherein patients develop different organ manifestations with variable disease activities (DA) and individuals show distinct responses to specific immunosuppressive or immunomodulatory therapies.2 Such clinical and biological diversity of SLE leads to the argument that SLE as a single disease can be fundamentally flawed.3 The heterogeneous nature of SLE is itself a reflection of complex and diverse pathogenesis.
The pathogenesis of SLE involves the hyperactivation in both innate and adaptive arms of the immune system, which eventually leads to loss of immune tolerance to autoantigens.1, 4, 5 The impaired clearance of DNA or RNA from dead cells including neutrophil extracellular traps (NETs) can stimulate plasmacytoid dendritic cells (pDCs) to produce type I interferon (TI-IFN), which in turn enhances antigen presentation and promotes both innate and adaptive responses.6, 7 In SLE, there is an expansion of activated B cells with aberrant differentiation in plasma cells to produce autoantibodies,8 which is supported by excessive B-helper function from CD4+ T cells.9 CD4+ T cells also show enhanced production of proinflammatory cytokines such as interleukin 17 (IL-17) and impaired function for the regulatory T (Treg) subset.10 CD8+ T cells in SLE also contribute to organ damage with their activity shown to predict prognosis.11 The end-organ damage in SLE is mediated by proinflammatory innate and adaptive immune cells as well as the pathogenic antigen-autoantibody immune complex that activate FcγR and complement pathways7 (Figure 1).

The complex and diverse pathways for SLE pathogenesis are clearly demonstrated by different mouse models for lupus, which are individually driven by distinct pathways of immune dysregulation. Impaired uptake of apoptotic cells in MFG-E8–/– mice causes splenomegaly and spontaneous formation of germinal centers, the production of autoantibody and the development of glomerulonephritis.12 The mice with the D18N mutation in Trex1, a major cytoplasmic exonuclease that degrades dsDNA and ssDNA, exhibit lupus-like symptoms including systemic inflammation, lymphoid hyperplasia, vasculitis, and kidney disease.13 Overexpression of B-cell activating factor (BAFF) in mice models results in SLE-like disease including production of autoantibodies and the development of glomerulonephritis.8, 14, 15 Aberrant differentiation of follicular helper T (Tfh) cells leads to autoreactive germinal center B-cell formation and autoantibody production in Roquinsan/san mice.16-19 Although an individual immune dysregulation is sufficient to drive lupus-like disease mice, the question is whether SLE patients could be divided into subgroups featured by distinct types of immune signatures, which could help to tailor patients with specific immunotherapies.
2 CONVENTIONAL CRITERIA FOR SLE CLASSIFICATION
Given the heterogeneity in SLE, the current international consensus for diagnosis and classification of SLE is based on a series of clinical, biochemical, and immunological parameters.20 These criteria are designed to diagnose disease based on an additive score but do not predict disease progression, severity, or risk of relapses. Earlier criteria by the Systemic Lupus International Collaborating Clinics (SLICC) addressed issues with initial diagnostic criteria developed by the American College of Rheumatology (ACR) which omitted features such a mucocutaneous, neuropsychiatric disease, and immunological parameters.20, 21 SLICC criteria considered biopsy-proven lupus nephritis (LN) along with a lupus antibody positivity (antinuclear antibodies [ANAs], double-stranded DNA [dsDNA], and anti-smith antibodies [anti-sm]) sufficient for the diagnosis of SLE.21 More recently European League Against Rheumatism (EULAR) and the ACR have developed a classification criterion, to create a more unbiased patient-specific approach compared with SLICC, to encompass an extended list of clinical and immunological features.20, 21 The EULAR/ACR criteria have a weighted additive score >10, from either clinical or immunological features and a screening anti-nuclear antibody (ANA) test being positive (Figure 2). There is some data to suggest that EULAR/ACR baseline score >20 may indicate more severe disease at 5 years.23 However, others believe 5 years is not sufficient to provide diseases accrual, and this classification may miss early organ-specific diseases.20-22 An unbiased patient-specific approach is required to improve early detection, disease prognosis, and patient-specific therapy.

A critical complication of SLE is LN. LN affects 29% of Caucasian and up to 80% of Asian patients and is associated with substantial adverse outcomes including mortality and progressive kidney disease.5 Up to 10% of patients develop kidney failure at 5 years.1, 5 The relapsing and remitting nature of LN, compounded by limited noninvasive biomarkers of DA (such as proteinuria and complement levels), makes it difficult to treat. Consequently, certain patients are left with low levels of DA while others are exposed to the adverse effects of potent immunosuppression.5 Immunosuppression currently used for LN has significant side effects, and patient-specific factors including tolerance of medications, cost, and compliance make treatment a further challenge.24
LN can be suspected in patients with biochemical changes in kidney function, onset of hematuria, proteinuria, and autoantibody positivity. However, definitive diagnosis still requires an invasive kidney biopsy which carries the risk of complications (minor complication rates reported being 8.1%–15%, and major complications rates between 1.5% and 6.6%).5, 25-27 LN is classified into six subtypes based on a broad spectrum of clinicopathologic features (Table 1).28 The histological grading of the disease provides insights into the activity and chronicity of kidney damage to guide the degree of immunosuppression (with Class I LN being an early disease not requiring aggressive immunosuppression, and Class VI being advanced diseases with significant fibrosis).5, 25 This classification is often insufficient in predicting prognosis and guiding therapy given the variable response to treatment.5 Ineffective therapies to treat LN in randomized controlled trials are shown to be effective in individual cases.29-31 There is an unmet need for further research into SLE and LN stratification by assessing these genetic and immunological changes in patients with the disease. A noninvasive precision medicine approach is required to not only mitigate biopsy-related risk but also to provide patient-specific therapy.
LN biopsy class | Features | Modified National Institute of Health activity and chronicity index |
---|---|---|
LN class I | Minimal changes on light microscopy. Mesangial deposition of IC by IF and EM | Activity score Endocapillary Hypercellularity (0–3) Neutrophil infiltration (0–3) Fibrinoid necrosis (0–3) x2 Hyaline deposits (0–3) Cellular or fibrocellular Crescents (0–3) x2 Interstitial inflammation (0-3) Score total 0–24 Score 0. absent 1. <25% glomeruli/cortex or interstium effected 2. 25–50% effected 3. >50% effected |
LN class II | Mesangial proliferative LN (any degree of hypercellularity >3 mesangial cells per 3-μm-thick section) on LM. Isolated subepithelial or subendothelial deposits (IF/EM) | |
LN class III | Focal LN < 50% glomeruli effected by LM. Active or inactive (endocapillary or extracapillary hypercellularity). | |
LN class IV | Diffuse LN > 50% of all glomeruli effect on LM. May have diffuse wireloops and subendothelial deposits | |
LN class V | Lupus membranous nephropathy, diffuse thickening of capillary walls on LM. Diffuse subepithelial deposits on EM. | Chronicity score Glomerulosclerosis score (0–3) Fibrous crescents (0–3) Tubular atrophy (0–3) Interstitial fibrosis (0–3) Score total 0–12 |
LN class VI | Advanced sclerosing glomerulonephritis, with >90% of glomeruli sclerosed on LM. |
- Abbreviations: EM, electron microscopy; IC immune complex; IF, immunofluorescence; LN, lupus nephritis; LM, light miscopy.
3 THE APPLICATION OF “OMICS” TECHNOLOGIES AT THE FRONTLINE OF PRECISION MEDICINE FOR SLE
With an increasing understanding of the pathogenesis of SLE by internationally recognized immunologists and clinical scientists, new immunotherapies for SLE have been developed to suppress the key pathway for immune activation by blocking TI-IFN,32 to reduce excessive B-cell survival and activation by blocking BAFF31, 33 and to reinstate the balance between regulatory and effector T cells by low-dose IL-2.34-36 It is now more important than ever to classify SLE patients based on systemic immune signatures. This new classification will not only help predict prognosis but essentially guide therapy for precision medicine to improve outcomes. Furthermore, identification of potential biomarkers and immunological pathways associated with specific classification may provide new targets for future therapy development, for overall or specific SLE subgroups. By applying cutting-edge “omics” technologies and artificial intelligence-based analysis, a “systems immunology” approach has been pioneered in stratifying patients with SLE (Figure 3). We will summarize representative studies of this kind using genomics, transcriptomics, metabolomics, and proteomics.

3.1 Genomics
Genomics are recognized as one of the most potent risk factors underpinning the development of SLE. Twin concordance studies suggest a range from 25% to 54% of genetically identical twins will develop SLE compared with 2%–5% of genetically distinct twins.37 Symptom heterogeneity exists among these twins and in part is thought to be related to epigenetic modifications of the genome.38, 39 This recognition accelerated genetic studies in SLE patients, with almost 100 loci now associated with SLE through genome-wide association studies (GWAS).40-44 Although gene loci from different studies differed between cohorts due to distinct ethnicities and sample sizes, common major regulatory pathways and mechanisms in SLE pathogenesis have been elucidated by such studies.41, 45, 46 More recently, whole exome or genome sequencing has discovered one or more rare or novel single nucleotide variants (SNVs) contributing to the development of SLE, identifying unique monogenic mutations contributing to heterogeneous phenotypes in SLE patients46 (Table 2).
Studies | Patient criteria & Sample size | Profiling methods | Analytic methods | Major discoveries in stratification |
---|---|---|---|---|
Harley et al.41 | European female SLE, N = 720 HC N = 2337 Extra data set SLE N = 1846 HC N = 1825 |
GWAS of SNPs. | PCA and multiple logistic regression model | Multiple genetic variants implicated in SLE development in European women were identified. These included variants in complement related genes, ITGAM, IFN alpha gene IRF7, coding DNA elongation factor KIAA1542 and DNA splicing gene PXK. Enrichment of these SNVs are thought to contribute to SLE pathogenesis. |
Kozyrev et al.40 | SLE N = 279 HC N = 515 |
Genotyping using 100 K Affymetrix array plus peripheral blood mononuclear cell (PBMC) RT-PCR, | Logistic regression analysis and conditional multiple logistic regression analysis | Associations between B-cell scaffold protein with ankyrin repeats gene, (BANK1) mutations, and SLE pathogenesis were seen. A nonsynonymous substitution and alternative splicing in the IP3R regions may lead to increase affinity binding of BANK1 and overactivation in SLE patients leading to disease. |
Almlof et al.45 | SLE N = 71 HC N = 142 |
Whole-genome sequencing (WGS) of parent-offspring trios | Random forest (RF) | RF scores allow for the prediction of rare SNPs causing variants in 22 monogenic genes contributing to SLE. These variants were thought to be inherited from a single parent, with RF scores grouping parents into high versus low risk based on WGS data. |
Jiang et al.42 | SLE N = 69 (SLE1) replication cohort of SLE 2 N = 64 HC, N = 97 | WGS and WES with GWAS undertaken | ADMIXTURE algorithm | Patients were divided into three groups HC, SLE1, and SLE2. Using a list of 76 GWAS SLE genes groups based on the proportion of rare SNVs noted HC < SLE1 < SLE2. A substantial number of SLE patients harbor multiple SNV variants in BANK1 and/or BLK. SNV found in BLK1 impair kinase function, related to suppression IFN production. Loss of function variants in BANK1 impair scaffold protein function leading to T1IFN activity. These SNV may account for disease development and IFN production in carriers with SLE. |
- Abbreviations: GWAS, genome-wide association studies; IFN, interferon; PCA, principal component analysis; RT-PCR, real-time polymerase chain reaction; SNV, single nucleotide variant; SLE, systemic lupus erythematosus.
One major advance from genome studies is the recognition of common immune pathways implicated in disease. These loci include genes with functions in DNA/RNA metabolism, classic complement pathway regulation, IFN regulation, and T- and B-cell function. Indeed, the risk of developing SLE attributed to these genes tends to reflect the biological pathway involved. As an example, genes involved in RNA or DNA metabolism often cause Mendelian SLE, whereas genes involved in IFN regulation increase risk by a relatively modest 20%–40%. Recently, studies have identified the mechanisms through which these loci contribute to SLE with SNVs in GWAS-associated genes disrupting regulation of TI-IFN resulting in enhanced TI-IFN production.42 It is also increasingly apparent that many SNVs that contribute to SLE are under significant selection pressure with enrichment of pathogenic variants among rare and novel SNVs. Monogenic mutations such as hereditary C1q deficiency have been implicated in SLE development along with mutations in other complement proteins45 and rare mutations in the TI-IFN associated Toll-like receptor (TLR) pathway being enriched in SLE patients.47 GWAS utilizing WES have permitted greater insight into these low-frequency mutations' association with SLE development (Table 2). While the heterogeneity of mutations across distinct arms of the immune system in SLE make treatment challenging in the absence of knowing patient-specific genes and pathways, it also represents an opportunity to individualize therapy where patient-specific signatures are known. Indeed, it has been suggested that certain GWAS loci are associated with specific unique transcriptomic signatures.48 With further machine learning analysis, genomics can be utilized as a tool to stratify patients with SLE and provide insights into disease pathogenesis.
3.2 Transcriptomics
Although there is a high genetic contribution to SLE, nongenetic risk factors are also important, which can be better examined by transcriptomics. Transcriptional changes often precede clinical manifestations and may provide patient stratification.11, 48-50 These studies demonstrate that the variations in IFN gene signatures (IGS) and the dysregulation of innate immune cells such as neutrophils or adaptive immune cells such as plasma cells or CD8+ T cells might stratify patients into different groups (Table 3).
Studies | Patient criteria & Sample size | Profiling methods | Analytic methods | Major discoveries in stratification |
---|---|---|---|---|
Chiche et al.50 | SLE N = 62 with 157 samples collected HC N = 20 adult patients |
Microarray-based transcript-omics of whole blood | Second generation of a modular framework | Assessment of IFN signatures revealed that 87% of SLE patient samples had upregulated IFN patterns. The upregulation of not only IFN alpha was noted but also IFN beta and gamma were seen the SLE patients. Modular analysis of IFN signatures revealed three distinct modules which had variable activation thresholds (M1.2 < M3.4 < M5.12) and were useful for stratifying patients within IFN modules. Variability with the longitudinal assessment of IFN gene signature (IGS) was seen within patients. |
Banchereau et al.48 | SLE N = 158 pediatric (profiled 924 longitudinal blood samples from 158 pediatric patients). HC N = 48 pediatric patients |
Microarray-based transcript-omics of whole blood | Unsupervised hierarchical clustering. Conducted modular analysis | By clustering of the interindividual SLEDAI correlation matrix, seven patient groups were identified (PG1–7). Each group had different immune signatures related to activity, containing erythropoiesis; IFN, myeloid lineage/neutrophils, plasmablasts, lymphoid lineage. PG1 and 6 mainly had erythropoiesis correlating to activity and had the lowest rates of LN. PG 2 and 3: plasmablasts and/or lymphoid lineage correlated to activity. PG4 and 5: IFN correlated with DA. PG 7 Myeloid and plasma blast signatures correlating to DA. It was also noted plasma blast signature was the most robust marker of disease activity, particularly for African American patients. Increase neutrophil transcripts were noted in prior and during disease flare. |
Figgett et al.,49 | SLE N = 161 HC N = 57 |
Transcript-omics: RNA-seq of whole blood | Unsupervised k-means clustering | Four groups in SLE patients: G1: similar to HC, low anti-Ro+ and, the highest rate of photosensitivity; G2: the highest rate of serositis, high expression of TLR7, high ISGs; G3: increased neutrophil signature, decreased PC signature, more flares than G1 & G2. G4: the highest disease activities and autoantibody titers, the highest rate of renal disorder and discoid rash, more flares than G1 & G2, high expression of BAFF, high ISG. |
Hong et al.51 | SLE pregnant Women N = 92 SLE women undergoing assisted reproduction N = 25 SLE nonpregnant N = 20 HC pregnant N = 43 HC nonpregnant N = 34 |
Whole blood Microarray Transcript-omics | Hierarchical clustering to assess differentially expressed transcripts (DET) PCA Q-Gen assessment of transcriptional patterns |
Four gene expression modules were identified in SLE patients (M1.2, IFN, M 4.11 Plasma cells, M2.3 erythropoiesis M4.2 inflammation and M5.15 neutrophils) SLE pregnancies with complications, e.g., pre-eclampsia (PET) had higher fold expression of the neutrophil signature versus noncomplicated SLE pregnancy. Upregulation of IFN and plasma cell signature were seen in fatal complications other than PET in SLE patients. |
Nehar-Belaid et al.52 | Cohort 1: children SLE (N = 33) and HC (N = 11), PBMCs (n = ~276,000) Cohort 2: adult SLE (N = 8) and HC (N = 6), PBMCs (n = ~82,000) |
Transcriptomics: scRNA-seq of PBMCs | Hierarchical clustering on combined child and adult matrices | Four groups in SLE patients: G1: the expansion of subclusters with a high signature of ISGs, lower disease activities than G2, MMF treatment in the majority. G2: the highest disease activities, the expansion of subclusters with the high signature of ISGs, MMF treatment in minority; G3 and G4: mixed with HC, the expansion of “memory” CD4 + and CD8 + T-cell subclusters. |
Fava et al.53 | SLE N = 30 | scRNA-seq and proteomics of kidney tissue and urine | PCA | Analysis showed an immune activation gradient in urinary proteomics which was higher in proliferative disease. Kidney tissues analysis revealed chemokine secretion in the urine was related to intrarenal chemokine production. IFN-γ was found to be the most abundant cytokine produced by infiltrating CD8 + T cells in the kidney tissue across all classes of LN. |
- Abbreviations: IGS, IFN gene signature; LN, lupus nephritis; PCA, principal component analysis; SLEDAI, SLE diseases activity index; SLE, systemic lupus erythematosus.
Current transcriptional and genomic data highlight a strong IGS elevation in SLE.48-50, 52, 54-56 These studies suggest discordant results surround IGS expression with DA, however, all data suggest TI-IFN is involved in the pathogenesis of SLE. Whole exome sequencing (WES) has highlighted various SVNs in TI-IFN genes found in SLE patients.42, 46 Further analysis of the pediatric population has highlighted different cell types correlate to disease activities based on subgroup stratification of disease among patients, although the relationship between subgroup stratification and conventional classification has not been investigated in detail.48 In this article, a subgroup analysis of disease activities based on response to drug therapy revealed Cyclophosphamide had a greater suppressive effect on DA modules as compared with steroids, however, steroids were more efficacious than Plaquenil in DA module suppression. Other studies have suggested that IGS are unreliable as a marker of activity due to the monocyte retention during quiesces and flare.57 Given the potent correlation between transcriptomics and DA, recent randomized controlled trials (RCTs) of anti-interferon therapy show greater efficacy in subgroups with pronounced TI-IFN signatures suggesting targeted anti-interferon therapy may be more efficacious for this subset of patients.27 Using transcriptional gene panels which correlate to disease could be considered as a cost-effective and noninvasive biomarker of SLE/LN activity.
Transcriptional profiling of both kidney tissue and urine samples may further provide insight into disease pathogenesis severity and response.53, 58-60 Der et al. undertook single-cell RNA-sequencing (scRNA-seq) of renal tubular cells and skin keratinocytes. The found nine major cell clusters, differentially expressed gene (DEG) enrichment was noted for TI-IFN signatures. TI-IFN response scores correlated with renal disease treatment response. Each subclass of LN itself had the differential expression of inflammatory pathways. Patients with Class III/IV LN had higher TI-IFN gene signatures than patients with Class V LN.58 Arazi et al. undertook the transcriptomic assessment of kidney tissue and urine samples. They found a high correlation of urine immune cells corresponding to infiltrating leukocytes seen in kidney tissue. This suggesting transcript assessment in both kidney tissues and urine samples may provide insights into disease response over traditional renal biopsies alone.60
3.3 Proteomics
High-resolution mass spectrometry can efficiently analyze protein abundance, modifications, and interactions. Proteomic analysis of kidney tissue and showed an immune activation gradient in urinary proteomics which was higher in proliferative disease. Kidney tissues analysis revealed chemokine secretion in the urine was related to intrarenal chemokine production within tubular cells. While transcriptomic analysis found IFN-γ, was found to be the most abundant cytokine produced by infiltrating CD8+ T cells in the kidney tissue across all classes of LN.53 Proteomic analysis of neutrophil subsets has shown differential expression of interferon regulated within SLE patients.61 Bashant et al.61 demonstrated that low density proinflammatory granulocytes (neutrophils) expressed an abundance of TI-IFN proteins compared with normal density neutrophils within SLE patients. Cytoskeleton changes were noted in these low-density granulocytes thought to contribute to increased trafficking and adherence of these neutrophils to microvasculature causing tissue damage during disease flare.61 These studies suggest proteomic analysis may provide additional information on cell regulation and immunoactivity in serum, urine, and kidney tissue samples in patients with SLE.
3.4 Metabolomics
Metabolomics comprehensively measure the repertoire of metabolites and (or small molecules) present in cells, tissues, and body fluids. Serum/plasma, and to a lesser extent, urine are most commonly body fluids from patients to be used to profile metabolomes, which have been applied to understand disease mechanisms, stratify patients and predict prognosis or response to therapy.62 Although the metabolomic study to understand the heterogeneity in SLE patients is yet to be performed, cohort studies analyzing SLE patient's sera have suggested metabolic alternations in these patients compared with healthy controls (HCs).63 Metabolomic profiling of SLE patients has demonstrated increased oxidative stress, fatty acid oxidation, and changes in glycolysis related to DA, these changes may precede clinical flares and are potential biomarkers for DA.63, 64 Pearl et al. analyzed 36 SLE patients versus 42 HC using peripheral blood lymphocytes (PBLs) by Liquid chromatography-tandem mass spectrometry (LC-MS). The most profound changes they noted in SLE patients' PBLs versus HC samples were in the pentose phosphate pathway (PPP).65 Enhanced oxidative stress was thought to reduce cysteine levels while cystine, kynurenine, cytosine were all increased. Kynurenine accumulation was reversed via the administration of N-acetyl cysteine (NAC) versus placebo in SLE patients. In vitro analysis of Kynurenine showed it was an activator of the mammalian target of rapamycin (mTOR). This provides mechanistic insights into disease development in SLE via metabolic changes.36 Analysis of fecal metabolites by Zhang et al. also found increased levels of fecal xanthurenic acid and kynurenic acid elevated in SLE patients versus HC. This suggests that metabolomic analysis may provide insights into disease mechanisms in SLE. Yan et al. performed Gas chromatography–mass spectrometry (GC-MS) in 30 SLE patients versus 29 HC.66 They noted l-valine was elevated at the time of SLE diagnosis, while l-tryptophan detection was correlated with DA. More recently Zhang et al. noted significant changes in glycerophospholipid metabolism in SLE patients' sera.67 Elevations in l-pyroglutamic acid levels were markedly increased in patients with active disease and correlated with disease activities, this was validated using area under curve (AUC) and receiver-operating characteristic curves (ROC). These data suggest that disease-specific metabolites may be considered noninvasive biomarkers for diagnosis and disease activities. However further analysis is required to use key metabolite changes to understand the heterogeneity of SLE patients and dynamics of flare-remission cycles of the disease.
4 IMPROVEMENT OF THE MINING OF OMICS DATA BY NEW ANALYTIC METHODS INCLUDING MACHINE LEARNING
A major challenge in the stratification of SLE patients based on omics data is how to interpret data that significantly exceed the scale of conventional clinical and immunological measures. Compared with clinical data are usually composed of up to dozens of variables, data generated by omics technologies can range from hundreds (such as metabolites) to tens of thousands (such as transcripts). The linear dimensionality reduction method principal component analysis (PCA) is a machine learning algorithm that has been widely used to handle data with high dimensionality.68 PCA reduces data into low-dimensional space and clusters samples. The method PCA aims to project the data into the most variant directions of the feature space, thus compressing the data and maintaining global information. The first principal component (PC1) corresponds to the direction with the largest contribution in feature space, followed by the second principal component (PC2) with the second largest contribution (Figure 4A). Based on the embedding data by PCA, clustering methods like k-means and hierarchical clustering can be applied to generating clusters of samples, which can be further investigated with the association with patients' clinical and immunological features. PCA was applied by several studies to stratify SLE patients which have been summarized above.49

Although PCA can retain the information of most variables, it often fails to preserve the local clustering structure since the local relationship between samples is less dependent on PC1 or PC2. Therefore, new methods including uniform manifold approximation and projection (UMAP) have been developed. UMAP relies on persistent homology to maintain the manifold topological structure.69 Specifically, UMAP first constructs a k-nearest neighbour (kNN) graph to approximate the manifold structure of data, then embeds the kNN graph to two-dimensional space to visualize the data (Figure 4B). UMAP has been shown to have a better capacity at maintaining local information as well as global structure in analyzing datasets for scRNA-seq.70
A recent study systemically compared UMAP with other mainstream methods for dimensionality reduction including PCA and demonstrated that UMAP was superior to PCA for clustering accuracy, neighbour information preserving and feature separating.70 In this study, a data set consisting of longitudinal transcriptome profiles of 65 SLE patients and 20 HCs was analyzed. UMAP, but not PCA, clearly demonstrated the separation of SLE patients from HCs. Furthermore, the embedding space by UMAP revealed new clustering structures, that is, subgroups in SLE patients' samples. Such clustering structures were found associated with patient visiting date and the trends of disease improvement or deterioration.70 This is an example of the capacity of new methods that can effectively analyze omics data and strengthen the detection of the heterogeneity within patients for disease diagnosis and therapy choices.
5 FUTURE PERSPECTIVE
Several recent reviews have touched on the topic of how omics technologies can pave the way for precision medicine in SLE, including the promise of transcriptomic profiling and machine learning integration may provide an insight into SLE immune pathways and the application of multi-omics approaches including genomics, immunophenotyping, proteomic, and transcriptomic assessments.38, 71-73 Compared with other reviewers also suggesting the application of omics technologies, we propose that the further integration of these “multi-omics” technologies using artificial intelligence including machine learning has the potential to identify noninvasive biomarkers of DA, novel prognostic markers for stratification and the utilization of precision medicine to tailor the choice of target therapies and duration of immunosuppression. Clinical translation of this study into standardized bioassays may prove to be superior to current markers of DA in SLE/LN and deliver improved patient-specific outcomes.
ACKNOWLEDGMENTS
Figures were created by BioRender. Di Yu is supported by Bellberry-Viertel Senior Medical Research Fellowship and his research was carried out at the Translational Research Institute. The Translational Research Institute is supported by a grant from the Australian Government.
CONFLICT OF INTERESTS
The authors declare that there are no conflict of interests.
AUTHOR CONTRIBUTIONS
Di Yu: Conceptualization; Writing; Supervision. Prianka Puri: Writing. Simon H. Jiang: Writing. Yang Yang: Writing. Fabienne Mackay: Supervision.