Multikingdom characterization of gut microbiota in patients with rheumatoid arthritis and rheumatoid arthritis-associated interstitial lung disease
Yida Xing, Yiping Liu, Shanshan Sha, and Yue Zhang contributed equally to this study.
Abstract
Rheumatoid arthritis-associated interstitial lung disease (RA-ILD) is a serious and common extra-articular disease manifestation. Patients with RA-ILD experience reduced bacterial diversity and gut bacteriome alterations. However, the gut mycobiome and virome in these patients have been largely neglected. In this study, we performed whole-metagenome shotgun sequencing on fecal samples from 30 patients with RA-ILD, and 30 with RA-non-ILD, and 40 matched healthy controls. The gut bacteriome and mycobiome were explored using a reference-based approach, while the gut virome was profiled based on a nonredundant viral operational taxonomic unit (vOTU) catalog. The results revealed significant alterations in the gut microbiomes of both RA-ILD and RA-non-ILD groups compared with healthy controls. These alterations encompassed changes in the relative abundances of 351 bacterial species, 65 fungal species, and 4,367 vOTUs. Bacteria such as Bifidobacterium longum, Dorea formicigenerans, and Collinsella aerofaciens were enriched in both patient groups. Ruminococcus gnavus (RA-ILD), Gemmiger formicilis, and Ruminococcus bromii (RA-non-ILD) were uniquely enriched. Conversely, Faecalibacterium prausnitzii, Bacteroides spp., and Roseburia inulinivorans showed depletion in both patient groups. Mycobiome analysis revealed depletion of certain fungi, including Saccharomyces cerevisiae and Candida albicans, in patients with RA compared with healthy subjects. Notably, gut virome alterations were characterized by an increase in Siphoviridae and a decrease in Myoviridae, Microviridae, and Autographiviridae in both patient groups. Hence, multikingdom gut microbial signatures showed promise as diagnostic indicators for both RA-ILD and RA-non-ILD. Overall, this study provides comprehensive insights into the fecal virome, bacteriome, and mycobiome landscapes of RA-ILD and RA-non-ILD gut microbiota, thereby offering potential biomarkers for further mechanistic and clinical research.
1 INTRODUCTION
Rheumatoid arthritis (RA) is a systemic autoimmune disease characterized by chronic synovitis that ultimately results in the destruction and disability of joints and other organs that affects approximately 0.5–1% of the population worldwide, and its prevalence is increasing with the aging population.1 RA-associated interstitial lung disease (RA-ILD) is a prevalent and significant complication unrelated to joint tissues. High-resolution CT can diagnose approximately 27–67% of patients with RA with RA-ILD.2-4 Compared to ILD caused by other autoimmune illnesses, interstitial pneumonia (UIP) or UIP-like patterns are the most prominent pathological features of RA-ILD.5, 6 Therefore, RA-ILD responds less to glucocorticoids and immunosuppressants and has a median lifespan of only 3.2–8.1 years.7, 8 In addition, RA-ILD has a notably lower survival rate than idiopathic pulmonary fibrosis.9-13
Although the exact cause remains unknown, genetic14, 15 and environmental factors16-18 are believed to affect RA-ILD development. Gut microbiota, one of the environmental factors, not only plays a crucial role in RA pathogenesis,19-21 but also significantly influences idiopathic pulmonary fibrosis progression.22, 23 The gut microbiota is a complex ecosystem involved in several physiological processes such as metabolism, absorption, and immunity,24-27 and helps maintain systemic immune balance in humans. Changes in gut microbiota composition and functioning, also known as dysbiosis, are closely associated with various autoimmune illnesses, including RA,28-30 systemic lupus erythematosus (SLE),31 and multiple sclerosis (MS).32 Individuals with RA typically exhibit lower gut bacterial diversity and an overabundance of proinflammatory species compared to healthy individuals. This suggests that the gut microbiota promotes inflammation and immune dysregulation in patients with RA, and correspondingly contributes to the occurrence of RA. However, previous studies have mainly focused on bacterial components in the gut microbiota, while overlooking the key role of viral and fungal components. Therefore, we hypothesized that adopting a multikingdom approach could provide novel insights into the gut microbiota's role in RA and RA-ILD pathogenesis.
In this study, we conducted whole-metagenome shotgun sequencing of fecal samples from 30 patients with RA-non-ILD, 30 patients with RA-ILD, and 40 matched healthy controls to investigate the gut microbiota composition in patients with RA-non-ILD and RA-ILD compared to that in controls. Examining the gut microbiota across multiple kingdoms will enhance our understanding of the intricate interactions between the gut microbiota and host immune system, which may shed light on potential therapeutic strategies for RA-non-ILD and RA-ILD, such as targeting the gut microbiota to modulate immune system homeostasis.
2 MATERIALS AND METHODS
2.1 Ethics statement
The study was approved by the Ethics Committee of the Second Affiliated Hospital of Dalian Medical University (approval number: 2022071) and was conducted in accordance with the principles of the Declaration of Helsinki and the International Council for Harmonization Guidelines for Good Clinical Practice. All the participants signed an informed consent form.
2.2 Subjects
This study included a total of 60 patients who were admitted to the Department of Rheumatology at the Second Affiliated Hospital of Dalian Medical University. Among them, 30 had RA-non-ILD and the rest were diagnosed with RA-ILD. All patients met the diagnostic criteria for RA as defined by the 2010 American College of Rheumatology (ACR)/European League Against Rheumatism (EULAR)33 and/or the updated criteria established by the ACR in 1987.34 Disease activity was evaluated using the Disease Activity Score of 28 joints (DAS28). ILD was diagnosed based on the criteria set by the American Thoracic Society/European Respiratory Society35 using high-resolution computed tomography performed by an experienced radiologist and rheumatologist. The following groups were excluded: (1) patients with malignancy, pyemia, cardiovascular, or metabolic disorders; (2) subjects with diarrheal symptoms; (3) patients who received antifungals, antibiotics, or probiotic treatment within 1 month; (4) subjects with excessive drinking habits; (5) Subjects who had consumed sour milk within 1 week.
Forty healthy subjects were recruited as controls with reference to age, sex, and body mass index (BMI) of patients with RA-non-ILD and RA-ILD from the Department of Medical Examination Center, Second Affiliated Hospital of Dalian Medical University. Healthy individuals did not have any of the following diseases: arthralgia, heart failure, renal failure, autoimmune diseases, or inflammatory disorders.
2.3 Sample collection
Fecal samples from each participant were immediately placed on dry ice after collection, subsequently transferred to the laboratory, dispensed, and stored in a −80°C refrigerator until further analysis.
2.4 DNA extraction and whole-metagenome shotgun sequencing
Total DNA from fecal samples (170 mg per sample) was extracted using the Tiangen fecal DNA extraction kit (Tiangen) according to the manufacturer's instructions. DNA quality and quantity were assessed using 1% agarose gels and Qubit® dsDNA Assay Kit in Qubit® 2.0 Fluorometer (Life Technologies), respectively. Samples with optical density ratios at 260 and 280 nm between 1.8 and 2.0 along with a DNA content of more than 1 μg were used to construct libraries, with 1 μg DNA per sample used as input material for DNA sample preparations. Sequencing libraries were generated using NEBNext® Ultra™ DNA Library Prep Kit for Illumina (NEB) following manufacturer's recommendations, and index codes were added to attribute sequences with each sample. Briefly, the DNA samples were broken down into 350 bp fragments by sonication. These DNA fragments were end-polished, A-tailed, ligated with full-length adapters for Illumina sequencing, and further polymerase chain reaction (PCR) amplification. Finally, the PCR products were purified (AMPure XP system), and libraries were analyzed for size distribution using Agilent2100 Bioanalyzer and quantified using real-time PCR. The index-coded samples were clustered using a cBot Cluster Generation System according to the manufacturer's instructions. After cluster generation, the libraries prepared were sequenced using Illumina NovaSeq. 6000 platform to generate paired-end reads.
Raw metagenomic sequencing reads from each sample were subjected to individual quality control processing using fastp.36 The fastp trimmed low-quality bases (Q < 30) from the ends and filtered out reads containing N, adapter contamination, or those less than 90 bp in length. The high-quality reads obtained were further aligned with the human reference genome (GRCh38) using Bowtie237 to remove identified human reads.
2.5 Gut bacteriome, mycobiome, and virome profiling
2.5.1 Gut bacteriome
The gut prokaryotic composition (hereafter referred to as “gut bacteriome”) in fecal metagenomes of all samples was profiled using the MetaPhlAn4 algorithm.38 The relative abundances of prokaryotic species were calculated by normalizing each sample, and the relative abundances at the phylum and genus levels were obtained by aggregating all species abundances from the same taxa.
2.5.2 Gut mycobiome
To profile the gut mycobiome composition, we downloaded the available fungal genomes from the National Center for Biotechnology Information (NCBI) RefSeq database. Genome metadata was obtained from the NCBI BioSample database, and only fungal strains isolated or sourced from human feces and/or digestive tract specimens were included. We used 1503 gut fungal genomes corresponding to 106 species as references. High-quality nonhuman metagenome reads for each sample were then aligned with these fungal genome references to generate gut fungal profiles. Reads that were mapped to fungal rRNA/tRNA gene sequences were excluded. To avoid potential contamination from other gut microbes (e.g., bacteria, archaea, and viruses), the reads that were mapped to fungal genomes were aligned against (1) all bacterial, archaeal, or viral sequences extracted from the NCBI NT database and (2) 4644 prokaryotic genomes from the Unified Human Gastrointestinal Genome database.39 The contaminating reads thus identified were removed. The relative abundances of 106 fungal species were calculated by normalizing each sample, and the relative abundances at the family and genus levels were obtained by summing species abundances from the same taxa.
2.5.3 Gut virome
A gut virus catalog comprising over 67 000 nonredundant viral operational taxonomic units (vOTUs), referred to as the Chinese gut viral catalog (cnGVC),40 was constructed from more than 10 000 publicly available fecal metagenomes. We mapped the high-quality reads of all samples into the cnGVC database using Bowtie237 with a nucleotide similarity threshold of 95% (a phylogenetic threshold for viral “species-level” definition).41 The abundance profile of vOTUs in each fecal sample was generated by aggregating reads mapped to each vOTU and dividing the relative abundance by the total number of mapped reads in each sample. The relative abundance profile at the viral family level was generated by aggregating the relative abundances of vOTUs assigned to the same family.
2.6 Microbial diversity analysis
For each sample, gut microbial richness was evaluated using the observed number of species-level taxa (i.e., bacterial species, fungal species, and vOTUs), while Shannon's and Simpson's indices were used to estimate microbiome diversity. These indices were calculated using the vegan package in R, with a uniform number of reads (10 million) per sample.
2.7 Statistical analyses
Statistical analyses were performed using the R v4.0.1 platform. The vegan package42 was used to perform principal coordinate analysis (PCoA) based on Bray-Curtis distances. Permutational multivariate analysis of variance (PERMANOVA) was conducted using the adonis function of vegan package, and the adonis p Value was obtained from 1000 permutations. The Student's t-test and Wilcoxon rank-sum test were used to evaluate statistical differences in diversity and taxonomic levels between any two cohorts, respectively. Multiple-testing correction was carried out using q values generated by the Benjamini-Hochberg procedure. A p value of less than 0.05 for a single test, or a q value of less than 0.05 for multiple tests, was considered statistically significant. Furthermore, random forest models were trained with 1000 trees using the randomForest package. These models utilized the abundance profiles of differentially expressed bacteria, fungi, and vOTUs to distinguish between patients and controls.
3 RESULTS
3.1 Study cohort
In this study, the fecal samples were collected from 30 patients with RA-ILD, 30 patients with RA-non-ILD, and 40 healthy controls. The clinical characteristics of all the participants have been summarized in Table 1. Both the RA-ILD and RA-non-ILD groups matched well with healthy individuals in terms of sex, age, and BMI. On average, the age of patients with RA-ILD was higher than that of patients with RA (Student's t-test, p = 0.003). Patients with RA-ILD had significantly elevated levels of rheumatoid factor (p = 0.004) and DAS28 (p = 0.032) compared with those with RA, suggesting higher disease activity. Regarding other clinical parameters, including disease duration, erythrocyte sedimentation rate, C-reactive protein levels, and anti-cyclic citrullinated peptide antibody levels, no significant differences were found between RA-ILD and RA-non-ILD groups.
RA-non-ILD patients | RA-ILD patients | p Value (RA-ILD vs. RA-non-ILD) | Healthy controls | p Value (RA-non-ILD vs. HC) | p Value (RA-ILD vs. HC) | |
---|---|---|---|---|---|---|
No. of individuals | 30 | 30 | 40 | |||
Gender, F/M | 22/8 | 19/11 | 0.580 | 24/16 | 0.809 | 0.312 |
Age, years | 59.4 ± 10.7 | 67.1 ± 8.1 | 0.003 | 63.9 ± 11.0 | 0.855 | 0.085 |
BMI, kg/m2 | 24.6 ± 3.4 | 23.8 ± 3.4 | 0.426 | 24.0 ± 3.9 | 0.695 | 0.221 |
Disease duration, months | 83 ± 111 | 111 ± 77 | 0.257 | |||
ESR, mm/h | 29.1 ± 22.0 | 40.6 ± 23.4 | 0.056 | |||
CRP, mg/L | 22.2 ± 22.4 | 32.2 ± 32.1 | 0.168 | |||
RF, IU/mL | 132.9 ± 159.8 | 451.2 ± 540.4 | 0.004 | |||
Anti-CCP, RU/mL | 121.0 ± 218.3 | 156.0 ± 136.9 | 0.462 | |||
DAS28 score | 4.2 ± 1.3 | 4.9 ± 1.1 | 0.032 |
- Note: Results are shown as mean ± SD. Fisher's exact test was used to calculate the p Values for sex, whereas Student's t-test was used for other parameters.
- Abbreviations: anti-CCP, anti-cyclic citrullinated peptide antibody; BMI, body mass index; CRP, C-reactive protein; ESR, erythrocyte sedimentation rate; HC, healthy control; RA-ILD, rheumatoid arthritis-associated interstitial lung disease; RF, rheumatoid factor.
3.2 Characterization of the gut bacteriome in RA-non-ILD and RA-ILD patients
To characterize the gut microbiota of patients with RA-non-ILD and RA-ILD, we generated a total of 528.0 Gbp of high-quality metagenomic data (5.3 ± 1.2 Gbp per sample) from all participants using whole-metagenome shotgun sequencing of their fecal samples. First, we described the gut prokaryotic composition in these metagenomic samples, and generated a gut prokaryotic profile that contained 1652 bacterial and archaeal taxa for further analyses, which included 17 phyla, 32 classes, 72 orders, 141 families, 388 genera, and 1002 species.
Rarefaction analysis revealed that gut bacteriome richness (estimated by the number of observed species) was approximately equal for the same number of samples among the three groups (Figure 1A). However, both the Shannon diversity and Simpson indices were significantly lower in the gut bacteriome of RA-non-ILD or RA-ILD groups than in healthy controls (Figure 1B). This suggests a reduced within-sample prokaryotic diversity under disease status. In contrast, the bacteriome diversity indices showed no differences between patients with RA-ILD and RA-non-ILD.

PCoA was conducted to further understand the differences in gut bacteriome among the three groups that showed a distinct separation between each group (Figure 1C). Additionally, PERMANOVA analysis revealed that the disease state explained 10.5% of the bacteriome variance (adonis p < 0.001), demonstrating considerable gut bacterial dysbiosis in patients with RA-ILD or RA alone. Conversely, PCoA and PERMANOVA did not demonstrate any significant difference between the bacteriomes of patients with RA-ILD or RA-non-ILD (effect size = 1.8%, adonis p = 0.368), implying that the variance between the two groups was relatively small.
Furthermore, pairwise comparison of phylogenetic composition profiles was performed among the three groups. At phylum level, the gut bacteriomes of all three groups were dominated by Firmicutes, Bacteroidetes, Actinobacteria, Proteobacteria, and Verrucomicrobia. Compared to healthy controls, Firmicutes and Verrucomicrobia had significantly higher abundances in patients with RA-ILD than in RA-non-ILD; Actinobacteria was abundant in both RA-ILD and RA-non-ILD groups, whereas Bacteroidetes and Proteobacteria were depleted in both patient groups (Figure 1D). At genus level, 146 genera were identified with significant differences in relative abundance between RA-non-ILD and RA-ILD groups and healthy controls (patients with RA-non-ILD vs. controls, n = 108; patients with RA-ILD vs. controls, n = 129). No genera significantly differed in abundance between the patient groups (Table S1). Compared to healthy controls, genera such as Ruminococcus, Collinsella, Gemmiger, and Dorea were markedly enriched in both RA-non-ILD and RA-ILD groups, whereas Bacteroides, Prevotella, Roseburia, Clostridium, Lachnoclostridium, Fusicatenibacter, Parabacteroides, and Megamonas were reduced (Figure 2A). Notably, Alistipes was significantly decreased in patients with RA-non-ILD, but not in RA-ILD, when compared to healthy subjects.

In addition, gut bacteriome compared among the three groups at the species level identified 351 species with significant differences in relative abundance (patients with RA-non-ILD vs. controls, n = 259; patients with RA-ILD vs. controls, n = 312; RA-non-ILD vs. RA-ILD groups, n = 9; Table S2). Compared to healthy controls, the representative species enriched in both RA-non-ILD and RA-ILD included Bifidobacterium longum, Dorea formicigenerans, and Collinsella aerofaciens, whereas the representative RA-non-ILD/RA-ILD-depleted species included Faecalibacterium prausnitzii, Bacteroides plebeius, Bacteroides dorei, Fusicatenibacter saccharivorans, Bacteroides uniformis, Roseburia inulinivorans, and Bacteroides xylanisolvens (Figure 2B). Gemmiger formicilis and Ruminococcus bromii were significantly enriched in patients with RA-ILD, but not in RA-non-ILD, when compared to healthy subjects, while Ruminococcus gnavus was uniquely enriched in RA-non-ILD group. Additionally, Collinsella tanakaei was markedly enriched in patients with RA compared to those with RA-ILD (Table S2).
3.3 Characterization of the gut mycobiome in patients with RA-non-ILD and RA-ILD
Next, we analyzed gut fungal composition of all fecal samples using the fungal genome database accessible in the NCBI RefSeq database, as described in the Methods section. We comprehensively profiled 106 fungal species representing 52 fungal genera, and conducted a comparative analysis among all the participants. Rarefaction analysis unveiled that, in an equivalent sample size, healthy controls exhibited the highest richness in gut fungal species, followed by patients with RA-ILD, whereas patients with RA displayed the lowest richness (Figure 3A). When comparing within-sample fungal diversity, it was revealed that patients with RA-ILD had a notably higher Shannon and Simpson indices than patients with RA-non-ILD and healthy controls. Conversely, no significant difference was observed between the diversity between patients with RA and healthy controls (Figure 3B).

Consistent with the observations in gut bacteriome, PCoA and PERMANOVA analyses of the gut mycobiome also showed a remarkable distinction between the patient and control groups (Figure 3C), with the disease state explaining 10.5% of the mycobiome variance (effect size = 10.6%, adonis p < 0.001). Similarly, the patient groups were also visibly separated in the PCoA plot, with a PERMANOVA effect size of 5.1% (adonis p = 0.006) between them. These findings suggest that disease state and different disease types/severities contribute to changes in the mycobiome.
At phylum level, the gut mycobiomes of all the samples were primarily composed of Ascomycota, Basidiomycota, and Mucoromycota. Of these, Mucoromycota was markedly higher in both patient groups than healthy controls (Figure 3D). Notably, although Ascomycota did not show a significant difference among the three groups, its subphyla, Pezizomycotina (filamentous fungus), were enriched in the patient groups, whereas Saccharomycotina (yeast) was reduced in these patients.
At genus level, 28 genera significantly differed in relative abundance among the three groups (patients with RA-non-ILD vs. controls, n = 22; patients with RA-ILD vs. controls, n = 16; RA-non-ILD vs. RA-ILD groups, n = 5; Table S3). Compared to healthy controls, genera such as Hortaea, Yarrowia, and Apophysomyces were markedly enriched in both patient groups, whereas the genera Saccharomyces and Candida were reduced in these groups (Figure 4A). In contrast, some genera did not show a consistent tendency in the patient groups when compared to healthy controls. For example, Malassezia was significantly more abundant in patients with RA-ILD than in healthy controls and patients with RA, and Clavispora and Pneumocystis were significantly less abundant in the RA-non-ILD group than in RA-ILD group and healthy controls.

At species level, 65 differential species were identified among the three groups (patients with RA-non-ILD vs. controls, n = 56; patients with RA-ILD vs. controls, n = 32; RA-non-ILD vs. RA-ILD groups, n = 1; Table S4). Similar to the results at the genus level, several fungal species including Saccharomyces cerevisiae GCA_003276085, Candida albicans GCA_005890775, Phialophora verrucosa GCA_002099365, and Candida parapsilosis GCA_000182765 were depleted in both patient groups compared to healthy individuals, while Hortaea werneckii GCA_003704615, Yarrowia lipolytica GCA_900537225, and Apophysomyces trapeziformis GCA_000696975 were increased in the patient groups (Figure 4B). In addition, several fungi, such as Pneumocystis jirovecii GCA_001477535, Pichia kudriavzevii GCA_003054445, and Clavispora lusitaniae GCA_009498055 were uniquely reduced in RA-non-ILD group, but not in RA-ILD group, compared to healthy subjects (Table S4).
3.4 Characterization of gut virome in patients with RA-non-ILD and RA-ILD
To establish the gut viral composition, clean metagenomic reads of each sample were mapped to the cnGVC database, a nonredundant gut virus catalog constructed from the Chinese population. A total of 54 762 vOTUs were quantified using our data set of 105 samples. Among them, 56.5% (30 955/54 762) were assigned to known viral families, which spanned 26 viral families, for further analysis.
Rarefaction analysis showed that viral detection increased with the number of samples, and the accumulative curve did not saturate with the current sample size for each group (Figure 5A). Despite this, the observed number of viruses was remarkably reduced in the patient groups compared to healthy subjects. Likewise, both within-sample diversity indices (Shannon index and Simpson index) of the gut virome were significantly decreased in patients with RA-non-ILD and RA-ILD (Figure 5B).

PCoA analysis of the gut viral composition revealed a significant separation between the patients and healthy controls (Figure 5C), with a PERMANOVA effect size of 8.0% (adonis p < 0.001) in virome variances. Similar to the observations in the gut bacteriome, the overall difference in viromes between the patient groups was not significant (effect size = 1.9%, adonis p = 0.263).
The gut viral compositions of the three groups were compared at the family level to investigate their viral signatures. Siphoviridae and Myoviridae were the most dominant viral families in all samples. The relative abundance of Siphoviridae was markedly increased in both patients with RA-non-ILD or RA-ILD compared to healthy controls, whereas Myoviridae abundance was significantly decreased in patients with RA-ILD but not in patients with RA-non-ILD (Figure 5D,E and Table S5). Several other low-abundance families were also altered among different groups. For example, Microviridae and Autographiviridae were significantly reduced in the patient groups compared to healthy controls, whereas Retroviridae was uniquely reduced in patients with RA (Table S5).
When comparing patients with RA and healthy controls, we identified 4367 differential vOTUs, of which 908 were more abundant in patients with RA and 3459 in controls (Figure 6A and Table S6). In addition, we showcased 5035 differential vOTUs on comparing individuals with RA-ILD and healthy controls, with 1251 being more abundant in patients with RA-ILD and 3784 in controls. The Venn plot illustrated shared relationships between the RA-non-ILD-associated and RA-ILD-associated viral signatures (Figure 6B). Only 47.8% of RA-non-ILD-enriched and 34.7% of RA-ILD-enriched vOTUs were common between them. A significantly large proportion of RA-non-ILD-depleted vOTUs (73.1%) was depleted in patients with RA-ILD (corresponding to 66.8% of the RA-ILD-depleted vOTUs). These results suggest that both disease subtypes lacked certain viruses in common, but the increased viruses differed.

The RA-non-ILD-enriched vOTUs comprised a large proportion of members belonging to Siphoviridae (50.0%) and unclassified viruses (45.0%), whereas the RA-depleted vOTUs were dominated by unclassified viruses (54.6%), followed by members of Siphoviridae (25.9%), Myoviridae (12.9%), and Quimbyviridae (4.3%) (Figure 6C and Table S6). Similarly, RA-ILD-enriched vOTUs mainly consisted of members of Siphoviridae (47.1%) and unclassified viruses (44.1%), whereas RA-ILD-depleted vOTUs were dominated by unclassified viruses (47.7%), followed by a few members belonging to Siphoviridae (28.4%), Myoviridae (16.3%), and Quimbyviridae (3.8%) (Figure 6D). Moreover, the bacterial host assignments of the disease-associated vOTUs were compared. The results showed that the RA-non-ILD-enriched vOTUs contained a substantial proportion of viruses that were predicted to infect Collinsella (11.7%) and Eubacterium (6.4%), while the RA-ILD-enriched vOTUs encompassed several viruses that were predicted to infect Akkermansia (6.9%), Ruminococcus (5.9%), Eubacterium (5.8%), and Collinsella (5.0%) (Figure 6C,D). In contrast, the healthy subjects contained a remarkably higher proportion of Bacteroides phages (corresponding to 14.9% of RA-depleted vOTUs and 10.9% of RA-ILD-depleted vOTUs), which were rarely found in the RA-ILD/RA-ILD-enriched viruses.
Additionally, only nine vOTUs (three RA-non-ILD-enriched and six RA-ILD-enriched vOTUs) showed significant differences in relative abundance between patients with RA-non-ILD and RA-ILD (Figure 6A and Table S6), which was consistent with the aforementioned PERMANOVA analysis findings of their overall viral communities.
3.5 Classification of disease status based on multikingdom signatures
To test whether RA-non-ILD and RA-ILD statuses could be predicted using gut microbiota, we employed a random forest model to discriminate between patients with RA-non-ILD or RA-ILD from healthy individuals based on the microbial signature abundance profiles. The results demonstrated high performance in classifying patients with RA and healthy controls, with areas under the receiver operating characteristic curve (AUCs) of 0.992, 0.962, and 0.911 for models trained based on gut bacterial, fungal, and viral signatures, respectively (Figure 7A). Similarly, the models trained using gut bacterial, fungal, and viral signatures achieved AUCs of 0.996, 0.983, and 0.894, respectively, in distinguishing patients with RA-ILD from controls (Figure 7B). These findings suggest that multikingdom microbial signatures are highly effective in predicting RA-non-ILD and RA-ILD.

Finally, we evaluated the potential of the gut microbiota in classifying the RA-non-ILD and RA-ILD statuses. The random forest models were trained based on the relative abundances of all microbes because of the insufficient differential microbes between patients with RA-non-ILD and RA-ILD. This analysis indicated relatively low discrimination power between patients with RA-non-ILD and RA-ILD based on the gut bacteriome, mycobiome, and virome, with AUCs of 0.807, 0.797, and 0.732, respectively (Figure 7C). However, the model trained using all microbiomes exhibited a considerably high performance in classifying patients with RA-non-ILD and RA-ILD with an AUC of 0.878 (Figure 7D). Furthermore, new random forest models trained using the most important microbes yielded an optimal AUC of 0.985 when using a subset of the 100 most important microbes (Figure 7E). Notably, several bacteria, such as RA-non-ILD-enriched Collinsella tanakaei, Desulfovibrio desulfuricans, and Collinsella intestinalis, and some viruses achieved the highest scores in RA-ILD classification models (Figure 7F), suggesting their potential central roles in RA-ILD stratification. Collectively, these findings highlight the diagnostic potential of multikingdom gut microbes in various disease subtypes.
4 DISCUSSION
In the present study, we compared and analyzed the gut microbiota compositions of 30 patients with RA-ILD, 30 patients with RA-non-ILD, and 40 healthy controls. Our findings showed distinct differences in the abundance of bacteria, fungi, and viruses among the three groups, which may further promote research into the mechanisms underlying RA and RA-ILD.
The gut bacteria that were depleted in patients with RA and RA-ILD primarily belonged to next-generation probiotics, which are known to generate short-chain fatty acids (SCFAs) and manage metabolic diseases.43, 44 F. prausnitzii can produce butyrate and other SCFAs by fermenting dietary fibers, and induce a tolerogenic cytokine profile that may exert further anti-inflammatory effects.45 Certain Bacteroides species can release SCFAs, particularly acetate and propionate, which are crucial in maintaining intestinal homeostasis and immune system stability.46 Our analysis at the genus level indicated a significantly higher abundance of Ruminococcus in patients with RA-ILD compared to healthy individuals and patients with RA. Increased Ruminococcus abundance has been linked to enhanced expression of T-cell death-associated gene 8 (TDAG8), which exacerbates local mucosal inflammation and disease severity in RA mice.47 At the species level, R. gnavus was significantly enriched in patients with RA-non-ILD. R. gnavus is a proprietary anaerobic bacterium that induces a proinflammatory effect in inflammatory bowel disease by producing dextran (inflammatory polysaccharide), which stimulates dendritic cells to secrete inflammatory cytokines (e.g., TNF-α) by activating toll-like receptor 4.48 Elevated abundance of R. gnavus has been associated with disease severity in ankylosing spondylitis49 and SLE.50 However, R. gnavus can also alleviate symptoms of atopic dermatitis by increasing the number of Treg cells and SCFAs in mice.51 Interestingly, we observed a notable rise in R. bromii abundance in patients with RA-ILD in this study. R. bromii exhibits probiotic properties by facilitating starch breakdown and producing metabolites and energy to support bacterial growth.52-54 Furthermore, R. bromii can generate acetate, which promotes butyrate synthesis and inhibits inflammatory responses by stimulating Treg cell development both in vivo and in vitro.55 Nevertheless, our current investigation indicated a notable increase in R. bromii abundance in individuals with RA and RA-ILD compared to healthy subjects. This may be attributed to strain variation, which can lead to physiological and functional differences in microbial-host interactions, thereby contributing to different host immune responses.56 Alternatively, the difference may arise from distinct pathophysiologies of RA and RA-ILD, although further investigations are warranted to ascertain the precise causes.
Unlike bacterial communities, few studies have been conducted on the composition and diversity of fungal communities.57 This study revealed a notable increase in both the quantity and variety of fungi in patients with RA-ILD compared to patients with RA-non-ILD and healthy individuals. In the gut mycobiome, Ascomycota, Basidiomycota, and Mucoromycota were the dominant phyla in all samples. Compared to healthy controls, patients with RA and RA-ILD had significantly higher Mucoromycota and Pezizomycotina abundance, while Saccharomycotina was reduced in these patients. Similarly, Hortaea, Yarrowia, and Apophysomyces were markedly enriched in both patients with RA and RA-ILD at the genus level, while Saccharomyces and Candida were reduced in these patient groups. Candida are Th17-dependent opportunistic pathogenic fungi that colonize mucocutaneous surfaces.58 Th17 cells are particularly associated with protection against Candida infections, and individuals with RA produce low levels of Candida-specific IL-17A. These dysfunctional reactions have been linked to increased oral Candida colonization and decreased IL-17A-dependent antimicrobial peptide generation in the saliva.59 In addition, Malassezia was found to be significantly enhanced in patients with RA-ILD compared to healthy controls as well as patients with RA, indicating its role in ILD progression. Malassezia is the predominant genus in skin microbiota and is associated with various skin diseases, especially atopic dermatitis, which seriously affects patient well-being. It induces the release of IL-17 and related cytokines, which play key roles in coordinating antifungal immunity and promoting skin inflammation by regulating Th17 cell-associated immune responses.60 Th17 cells are mainly found on the mucosal surfaces of the intestine, skin, and lungs and are important mediators of extracellular bacterial and fungal immune responses.61 In the early stages of RA, Th17 cells may interact with immune and stromal cells in the synovial tissue, which may eventually result in long-lasting inflammation, permanent cartilage deterioration, and bone erosion.62 Therefore, we propose that Malassezia potentially contributes to both RA and RA-ILD development by modulating Th17-mediated immunological reactions. However, the exact mechanism requires further investigation. Moreover, the prevalence of Pneumocystis was notably higher in individuals with RA-ILD than in those with RA alone, likely due to the immunosuppressive effects of methotrexate or biological agents used in treating RA. These medications diminish body's immune response, while pre-existing lung damage provides a favorable environment for Pneumocystis invasion.12
Our study findings indicated a significant decrease in the number of viruses and increase in the prevalence of Siphoviridae in patients with RA and RA-ILD compared to healthy individuals. However, Myoviridae abundance was significantly reduced in patients with RA-ILD alone, indicating potential differences in viral composition between RA and RA-ILD. Both Myoviridae and Siphoviridae families possess elongated tail structures that function as organelles to deliver DNA to host targets, which are strongly associated with R-type and F-type phage tail-like bacteriocins, respectively. These bacteriocins exhibit powerful bactericidal properties. However, the precise mechanisms by which they operate are not fully understood.63 Currently, our understanding of intestinal viruses and their role in autoimmune diseases remains limited.64 Regarding the viral host aspect, our study revealed that vOTUs enriched in patients with RA without ILD comprised a significant proportion of viruses that were projected to infect Collinsella. In contrast, the viruses predicted to infect Akkermansia and Ruminococcus were more abundant in patients with RA-ILD. A cross-sectional cohort study demonstrated an association between Collinsella and cumulative inflammatory burden in RA.65 These findings suggest that the abundance of bacteria can affect phages by serving as hosts for them.
The gut consists of a complex ecological network in which fungi, bacteria, and viruses coexist. In disease states, this network may be disrupted and the interaction pattern in the gut altered, which may reflect their potential roles in RA and RA-ILD.66 However, no statistically significant differences were observed in the composition of intestinal bacteria, fungi, or viruses between patients with RA and those with RA-ILD in the built model. The model's ability to differentiate between patients with RA and RA-ILD was relatively modest, with AUCs of 0.807, 0.797, and 0.732, respectively. Nonetheless, a model encompassing all the microbiota holds considerable significance in classifying patients with RA and RA-ILD. The major limitation of this study is that although significant changes in the gut microbiota of patients with RA and RA-ILD have been described in detail, the causal relationship between these changes and disease status could not be explored. Further research is imperative to uncover the potential role of the significantly altered gut microbiota in the onset and development of RA and RA-ILD.
A major limitation of this study is the relatively small sample size, considering the high complexity and variability of the gut microbiota. Furthermore, this study did not consider the effects of medications used for RA and RA-ILD on the gut microbiota, nor did it consider other confounding factors such as dietary habits and lifestyle. To address these limitations, future research with larger sample sizes based on strict sampling designs is warranted. Additionally, although significant changes in the gut microbiota of patients with RA and RA-ILD have been described in detail, the causal relationship between these changes and disease status could not be explored. Further research is imperative to uncover the potential role of the significantly altered gut microbiota in the onset and development of RA and RA-ILD.
5 CONCLUSION
In this study, we systematically characterized the composition of intestinal bacterial, fungal, and viral groups in patients with RA-non-ILD and RA-ILD using metagenomic sequencing of fecal samples for the first time. The findings indicated notable changes in the gut microbiomes of patients with RA, with or without ILD, compared to those of healthy subjects. Functional analysis performed to identify multikingdom signatures of the gut microbiota associated with RA-non-ILD and RA-ILD indicated a potential biomarker for these diseased states. Overall, our study provides favorable evidence for the possible pathogenesis and clinical treatment options for RA-non-ILD and RA-ILD.
AUTHOR CONTRIBUTIONS
Yida Xing, Xiaodan Kong, Xiaochi Ma, Qiulong Yan, and Yiping Liu contributed to the study conception and design. Yida Xing, Yiping Liu, Shanshan Sha, Qiulong Yan, and Yue Zhang drafted the manuscript. Yida Xing, Yuemeng Dou, Yiping Liu, Changyan Liu, Mingxi Xu, Lin Zhao, Shanshan Sha, and Jingdan Wang collected samples and information. Yue Zhang, Qiulong Yan, Yida Xing, Yiping Liu, Xiaochi Ma, Xiaodan Kong, Changyan Liu, Mingxi Xu, and Yan Wang performed data analysis and investigation. All authors revised the manuscript, contributed to the article, and approved the submitted version of the manuscript.
ACKNOWLEDGMENTS
This work was supported by grants from National Natural Science Foundation of China (No. 82225048), Liaoning Province Key Clinical Specialized (Department of Rheumatology, the Second Affiliated Hospital of Dalian Medical University) Funds, Dalian Key Laboratory for Autoantibody Testing, Dalian Medical University Interdisciplinary Research Cooperation Project Team Funding (JCH22023017), and the Cultivating Scientific Research Project of the Second Hospital of Dalian Medical University (XJ2023001102).
CONFLICT OF INTEREST STATEMENT
The authors declare no conflict of interest.
ETHICS STATEMENT
This study was approved by the Ethics Committee of the Second Affiliated Hospital of Dalian Medical University (2022-071). All the participants provided informed consent to participate in the study.
Open Research
DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories with accession number PRJEB62405 (https://www.ebi.ac.uk/ena/browser/view/PRJEB62405)