Continuous evolution and emerging lineage of seasonal human coronaviruses: A multicenter surveillance study
Run-Ze Ye, Cheng Gong, Xiao-Ming Cui, and Jin-Yue Liu contributed equally to this study.
Abstract
The seasonal human coronaviruses (HCoVs) have zoonotic origins, repeated infections, and global transmission. The objectives of this study are to elaborate the epidemiological and evolutionary characteristics of HCoVs from patients with acute respiratory illness. We conducted a multicenter surveillance at 36 sentinel hospitals of Beijing Metropolis, China, during 2016–2019. Patients with influenza-like illness (ILI) and severe acute respiratory infection (SARI) were included, and submitted respiratory samples for screening HCoVs by multiplex real-time reverse transcription-polymerase chain reaction assays. All the positive samples were used for metatranscriptomic sequencing to get whole genomes of HCoVs for genetical and evolutionary analyses. Totally, 321 of 15 677 patients with ILI or SARI were found to be positive for HCoVs, with an infection rate of 2.0% (95% confidence interval, 1.8%–2.3%). HCoV-229E, HCoV-NL63, HCoV-OC43, and HCoV-HKU1 infections accounted for 18.7%, 38.3%, 40.5%, and 2.5%, respectively. In comparison to ILI cases, SARI cases were significantly older, more likely caused by HCoV-229E and HCoV-OC43, and more often co-infected with other respiratory pathogens. A total of 179 full genome sequences of HCoVs were obtained from 321 positive patients. The phylogenetical analyses revealed that HCoV-229E, HCoV-NL63 and HCoV-OC43 continuously yielded novel lineages, respectively. The nonsynonymous to synonymous ratio of all key genes in each HCoV was less than one, indicating that all four HCoVs were under negative selection pressure. Multiple substitution modes were observed in spike glycoprotein among the four HCoVs. Our findings highlight the importance of enhancing surveillance on HCoVs, and imply that more variants might occur in the future.
1 INTRODUCTION
Since the first identification in the mid-1960s, four seasonal human coronaviruses (HCoVs), namely HCoV-229E, HCoV-OC43, HCoV-NL63, and HCoV-HKU1, have been endemic in human populations,1-4 regularly infecting and reinfecting humans due to short-lasting protective immunity.5 The HCoVs are believed to have zoonotic origins. Bats are the most likely animal hosts of HCoV-229E and HCoV-NL63 in the genus Alphacoronavirus,6, 7 though dromedary camels and alpacas might serve as the intermediate hosts of HCoV-229E.8 The other two HCoVs, HCoV-OC43 and HCoV-HKU1, in the genus Betacoronavirus, most likely originate from rodents, and bovines might be the intermediate hosts of HCoV-OC43.7, 9 In addition, three highly pathogenic coronaviruses (SARS-CoV, MERS-CoV, and SARS-CoV-2) emerged in the past two decades. SARS-CoV in the subgenus Sarbecovirus and MERS-CoV in the subgenus Merbecovirus of Betacoronavirus also have a bat origin, possibly with palm civets and camels as intermediate hosts, respectively.9-12 The origin and intermediate hosts of the recently emerged SARS-CoV-2 remain unclear, although various SARS-CoV-2-related viruses have been reported in bats and pangolins.13, 14
In spite of a newly emerging HCoV, SARS-CoV-2 shares many similarities to HCoVs in ecological and epidemiological aspects. As mentioned above, both SARS-CoV-2 and HCoVs seem to evolve from animal coronaviruses and spill over to humans. Secondly, repeated infections likely occur for either HCoVs5 or SARS-CoV-215 owing to short-lasting immune protection. Finally, both SARS-CoV-2 and HCoVs have been transmitted globally.16 Thus, understanding the genetic diversity and evolutionary characteristics of HCoVs will provide referential insight into the possible trajectories of SARS-CoV-2 evolution.
Under this premise, previous studies analyzed the evolutionary dynamics of HCoV-OC43 and HCoV-229E, and identified adaptive evolution based on available datasets.17-19 Here, we conducted a multicenter surveillance by including all patients positive for any HCoVs at 36 sentinel hospitals of Beijing Metropolis, China from January 2016 to December 2019, and performed metatranscriptomic sequencing of positive samples. The epidemiological characteristics of patients infected with HCoVs were described. Subsequently, based on the full-length genome sequences of the four HCoVs, we performed phylogenetic analyses, selection pressure analyses, and comparative genome analyses to understand the genetic diversity and evolutionary characteristics of these HCoVs.
2 MATERIALS AND METHODS
2.1 Study population and case definition
Based on the Respiratory Pathogen Surveillance System (RPSS), a multicenter surveillance on acute respiratory illness was conducted at 36 sentinel hospitals of Beijing Metropolis, China (Supporting Information: Figure 1) from January 2016 to December 2019. Outpatients and inpatients with acute respiratory illness were eligible for inclusion. We used the case definition for surveillance suggested by the World Health Organization, briefly: The influenza-like illness (ILI) was defined as “an acute respiratory illness with a measured temperature of ≥38°C and cough, with onset within the past 10 days.” The severe acute respiratory infection (SARI) was defined as “an acute respiratory illness with a history of fever or measured fever of ≥38°C and cough, with onset within the past 10 days, requiring hospitalization.”20 Healthcare workers at the sentinel hospitals filled in the questionnaire, and collected one type of respiratory specimen (nasopharyngeal swab, or sputum, or other lower respiratory specimen) from each patient in the meantime according to the surveillance protocol. The data quality assurance was conducted by the data managers at Beijing Center for Disease Prevention and Control (CDC), China. The patients without specimens, or with missing personal information and clinical records were excluded.
2.2 Pathogen screening and viral genome assembly
The details of screening for HCoVs and other respiratory pathogens (Supporting Information: Table 1) and metatranscriptomic sequencing, were described in Supporting Information Methods. All full-length genomes of HCoVs were assembled, and then annotated according to specified reference genomes (Supporting Information Methods). All viral genomes obtained in this study were deposited in GenBank as listed in Supporting Information: Table 2.
2.3 Phylogenetic analyses
All assembled complete genomes were aligned with available sequences in GenBank, and any ambiguously aligned regions were removed. Phylogenetic trees of HCoVs were constructed using the maximum likelihood method with 1000 bootstrap replicates (Supporting Information Methods). The phylogenetic trees were rooted using their outgroups and visualized using ggtree package (version 3.0.4)21 in R software (version 4.1.1) (https://www.R-project.org/). Genotypes were determined according to the lineage distribution of the phylogenetic trees based on the complete genomes.
2.4 Evolutionary analyses
The number of nonsynonymous substitutions per nonsynonymous site (dN), the number of synonymous substitutions per synonymous site (dS), and the dN/dS ratio were estimated for the key genes (including ORF1ab, S, M, N, E, and RNA-dependent RNA polymerase [RdRp]) shared by all four HCoVs using the SLAC and MEME algorisms in the Datamonkey website (https://www.datamonkey.org).22 We subsequently conducted genome substitution analyses of spike glycoprotein in each genotype using BioAider (version 1.334)23 (Supporting Information Methods).
2.5 Statistical analysis
Descriptive statistics were calculated for all variables. Continuous variables were summarized as median and range, and categorical variables were summarized as frequencies and proportions. The Shapiro–Wilk was used to test for normality of data. When the p value of Shapiro–Wilk test was lower than 0.1, Wilcoxon rank-sum test was used to assess the differences between groups of continuous variables. Kruskal–Wallis rank sum test was conducted for the age distributions among four infection groups of HCoVs. Pearson's χ2 or Fisher's exact test was applied to examine differences between categorical variables. A two-sided p value of less than 0.05 was considered significant. For further model estimation, a stepwise analysis was processed to select the variable with a threshold of p value < 0.05 according to the Akaike information criterion using MASS package (version 7.3-58.1).24 The multivariate logistic regression was conducted to explore the factors related to SARI, including age, sex, residence, infection group, and coinfection. The odds ratios (ORs) with 95% confidence interval (CI) were estimated according to the fitted model. All statistical analyses were conducted in R software.
3 RESULTS
3.1 Characteristics of patients infected with HCoVs
A total of 15 677 patients with either ILI or SARI met the inclusion criteria, and submitted their samples for testing respiratory pathogens from January 2016 to December 2019. Overall, 321 patients were positive for HCoVs, with a positive rate of 2.0% (95% CI, 1.8%–2.3%), among which 60 (accounting for 18.7%) were positive for HCoV-229E, 123 (38.3%) for HCoV-NL63, 130 (40.5%) for HCoV-OC43, and 8 (2.5%) for HCoV-HKU1. Among the 321 positive specimens, there were 171 nasopharyngeal swabs (including 13 positive for HCoV-229E, 82 for HCoV-NL63, 70 for HCoV-OC43, and 6 for HCoV-HKU1), 131 sputa (including 42 positive for HCoV-229E, 34 for HCoV-NL63, 53 for HCoV-OC43, and 2 for HCoV-HKU1), and 19 lower respiratory specimens (including 5 positive for HCoV-229E, 7 for HCoV-NL63, and 7 for HCoV-OC43).
There was no significant difference in male-to-female proportion among the four infection groups of HCoVs (p = 0.74). The age of HCoV-positive cases ranged from 0 to 98 years, with a median of 53 (interquartile range [IQR]: 10–69) years. The age distribution was significantly different among the four HCoV infection groups (p = 0.0009). Compared with other infection groups, HCoV-229E infection was more likely to occur in urban areas than in suburban areas (p < 0.0001). HCoV-229E (80.0%) and HCoV-OC43 (56.2%) infections were more likely to cause SARI than HCoV-NL63 (45.5%) and HCoV-HKU1 (37.5%) infections, with a significant difference among the four infection groups (p < 0.0001). There were 133 (41.4%) patients coinfected with other respiratory pathogens. The coinfection rates were significantly different among the four infection groups (p = 0.013). Of the 180 SARI cases, 50.6% were coinfected with other pathogens, much higher than that (29.8%) of 141 ILI cases (p = 0.0003) (Supporting Information: Table 3). Out of 180 SARI patients, 108 (60.0%) had at least one underlying disease (Table 1).
Total (n = 321) | HCoV-229E (n = 60) | HCoV-NL63 (n = 123) | HCoV-OC43 (n = 130) | HCoV-HKU1 (n = 8) | p Value | |
---|---|---|---|---|---|---|
Gender | 0.74b | |||||
Female | 155 (48.3%) | 31 (51.7%) | 62 (50.4%) | 58 (44.6%) | 4 (50.0%) | |
Male | 166 (51.7%) | 29 (48.3%) | 61 (49.6%) | 72 (55.4%) | 4 (50.0%) | |
Median age (IQR) | 53 (10, 69) | 63 (53, 74) | 29 (5, 68) | 53 (22, 68) | 51 (6, 73) | 0.0009c |
Location | <0.0001b | |||||
Urban | 188 (58.6%) | 51 (85.0%) | 59 (48.0%) | 74 (56.9%) | 4 (50.0%) | |
Suburban | 133 (41.4%) | 9 (15.0%) | 64 (52.0%) | 56 (43.1%) | 4 (50.0%) | |
Diagnosis | <0.0001b | |||||
SARI | 180 (56.1%) | 48 (80.0%) | 56 (45.5%) | 73 (56.2%) | 3 (37.5%) | |
ILI | 141 (43.9%) | 12 (20.0%) | 67 (54.5%) | 57 (43.8%) | 5 (62.5%) | |
Coinfection | 133 (41.4%) | 34 (56.7%) | 54 (43.9%) | 42 (32.3%) | 3 (37.5%) | 0.013b |
Underlying diseasesa | <0.044b | |||||
Respiratory diseases | 41 (22.8%) | 14 (29.2%) | 11 (19.6%) | 16 (21.9%) | 0 (0.0%) | |
Endocrine diseases | 31 (17.2%) | 8 (16.7%) | 11 (19.6%) | 11 (15.1%) | 1 (33.3%) | |
CVDs | 83 (46.1%) | 24 (50.0%) | 25 (44.6%) | 32 (43.8%) | 2 (66.7%) | |
Other diseases | 26 (14.4%) | 12 (25.0%) | 2 (3.6%) | 11 (15.1%) | 1 (33.3%) | |
Without underlying diseases | 72 (40.0%) | 13 (27.1%) | 28 (50.0%) | 31 (42.5%) | 0 (0.0%) |
- Note: Data are n (%) unless specified.
- Abbreviations: CVDs, cardiovascular diseases in ICD 11; endocrine diseases, endocrine nutritional or metabolic diseases in ICD 11; ILI, influenza-like illness; IQR, interquartile range; Other diseases, other underlying disease of the skin, and so forth; respiratory diseases, diseases of the respiratory system in ICD 11.
- a Underlying diseases were only available from medical records of SARI patients.
- b Fisher's exact test.
- c Kruskal-Wallis rank sum test between age distributions.
Among 15 677 respiratory samples, 3827 were collected in 2016, 3893 in 2017, 3986 in 2018, and 3971 in 2019. The positive rates of HCoVs varied from year to year (p < 0.0001) (Supporting Information: Figure 2), with the highest in 2018 (3.2%) and the lowest in 2017 (1.4%), and fluctuated in different patterns among the four infection groups of HCoVs (Figure 1A). The seasonality was obvious in HCoV-NL63, HCoV-OC43, and HCoV-229E infection with the peak around September, September and December, respectively (Figure 1B). The mean age of SARI patients was significantly higher than that of ILI cases in each infection group (all p < 0.05) (Figure 1C). The age distribution of patients coinfected with other respiratory pathogens was significantly higher than that of patients without coinfection (p < 0.0001) (Supporting Information: Figure 3). Among the 133 patients with coinfection, the main coinfected pathogen was Stenotrophomonas maltophilia (22/133, 16.5%). The most common pathogens coinfected with HCoV-229E, HCoV-NL63, and HCoV-OC43 were influenza A virus as well as Pseudomonas aeruginosa (7 and 7), S. maltophilia (11), and Haemophilus influenzae (10), respectively (Figure 1D). The data of underlying diseases were extracted from medical records, which were only available from SARI patients. The proportion of SARI patients with underlying diseases varied among different HCoV infection groups (Figure 1E).

We conducted the multivariate logistic regression by including the type of disease (SARI or ILI) as the dependent variable and the possible risk factors related to SARI as independent variables. The increase in age, male individuals, living in suburban areas, with coinfection were significant risk factors for SARI with adjusted ORs of 1.02 (95% CI: 1.02–1.03), 1.82 (95% CI: 1.10–3.03), 1.85 (95% CI: 1.09–3.21), and 1.84 (95% CI: 1.09–3.12), respectively (Supporting Information: Table 4).
3.2 Continuous emergence of novel lineages in HCoVs
From the 321 positive samples, 179 complete or nearly complete genome sequences of HCoVs were obtained, including 43 HCoV-229E genomes, 58 HCoV-NL63 genomes, 74 HCoV-OC43 genomes, and 4 HCoV-HKU1 genomes (Supporting Information: Table 2). We then performed phylogenetic analyses based on the 179 complete genome sequences of this study and those deposited in GenBank. The 43 full-length HCoV-229E sequences in this study had obviously evolved into an independent cluster phylogenetically distinct from known genotypes,25 and formed a distinct lineage (Figure 2A). Besides, additional 13 strains from Japan, Haiti, the United States and other places of China also fell into the lineage, all of which were detected during the period of 2016–2020, indicating emerging lineage might have become currently dominant lineage in the world. Phylogenetic tree constructed using MrBayes method showed a similar topology (Supporting Information: Figure 4A), and validated the phylogenetic analysis results.

The phylogenetic analyses based on previously available HCoV-NL63 sequences classified HCoV-NL63 into A, B, and C genotypes.26, 27 Genotype A could be further classified into three subgenotypes (A1, A2, and A3), and genotype C also had three subgenotypes (C1, C2, and C3). By including the newly sequenced 58 HCoV-NL63 genomes in this study, the phylogenetic analyses revealed that 35 clustered in an emerging lineage generated from genotype B. While, 5 sequences were located in B, 8 in C2, and 10 in C3 subgenotypes (Figure 2B and Supporting Information: Figure 4B).
HCoV-OC43 is the most common seasonal HCoV, which could be divided into 11 genotypes (A–K) according to the phylogenetic analysis based on S gene.28 Our 74 HCoV-OC43 genome sequences fell in four genotypes, and in an emerging lineage in the phylogenetic tree (Figure 2C and Supporting Information: Figure 4C). Two recently reported genotypes J and K had distinct evolutionary origins, with genotype J evolving from genotype H and genotype K evolving from genotype I. Notably, the emerging lineage in this study occupied a relatively independent position in the phylogenetic tree, and obviously evolved from genotype G. The seven strains in the emerging lineage were detected in five sentinel hospitals from three districts, with no signal of outbreak specific to an area.
Since the first identification of HCoV-HKU1 infection in 2004,4 most of previously reported full-length genome sequences have been from China, and the rest from the United States and Japan. These sequences belong to three genotypes (A, B, and C). Four complete genomes were obtained in this study, including one in genotype A and three in genotype B (Figure 2D and Supporting Information: Figure 4D). In a previous study, a strain of genotype A was also identified at Beijing, while a genotype C was detected in Hebei Province neighbored with Beijing in 2015.27
3.3 Evolutionary characteristics of HCoVs
To study the dN in different genes of HCoVs, selection pressure analysis was respectively performed. The dN/dS ratio of all key genes (ORF1ab, S, M, N, E, and RdRp) in each HCoV was less than 1, indicating that HCoVs as a whole are under negative selection pressure (Supporting Information: Table 5). The conserved RdRp region had a lower dN/dS ratio as previously reported.17 The high frequency of nonsynonymous substitutions was observed in the first three nonstructural proteins (nsp1, 2, and 3) and S1 region, where significantly more nonsynonymous substitutions were observed than in other regions (Supporting Information: Figure 5). These frequent presences of substitutions in these regions might change viral infectivity and human's immune responses to infections of HCoVs.12
To determine the key substitution sites, we subsequently focused on the substitution analysis of S protein for every genotype and emerging lineage (Figure 3). In HCoV-229E, the amino acid substitutions were gradually accumulated with time. Among the 79 sites with the substitution frequency over 50%, 34 were detected in genotype 3 that initially appeared in 1993 and maintained in subsequent genotypes. All HCoV-229E in the emerging lineage appeared 13 new substitutions, including R89E, F90V, T91I, V288E, Q311G, K349R, G358P, N365D, N377I, D391A/Y, Y406G, S407Y, and T971V (Figure 3A and Supporting Information: Figure 6A). Among them, Q311G, G358P, Y406G, and S407Y were located at receptor binding loops.

HCoV-NL63 in this study belonged to B, C2, C3 genotypes and an emerging lineage, which had remarkable increase in substitution sites in S protein compared with other genotypes. Notably, a serial of 31 substitutions at position 1–304 was discovered in genotype B and the emerging lineage, both of which were predominately detected in this study (Supporting Information: Figure 6B). A total of 26 of the 31 substitutions initially appeared in the region 1–201 of subgenotype A2 in 1988. The other five substitutions in the region 1–231 were highly similar to those of subgenotype C3 (Figure 3B).
The substitutions in HCoV-OC43 could be roughly divided into two categories, one of which included genotype A, E, B, H, and J, and the other include genotype C, D, F, G, I, K and the emerging lineage. The two categories shared 25 substitutions, but the remaining were different, especially in S2 region (Figure 3C). The emerging lineage and genotypes I and K found in this study had even more substitutions (Supporting Information: Figure 7A), indicating currently active evolution. The novel substitution sites were mainly located in the receptor binding domain, which is known to be associated with the abilities of adapting humans and escaping humoral immune response.29 The unique substitution feature in HCoV-OC43 was insertion and deletion of different fragments, which occurred at residues 24, 259, and 495–501. There was higher diversity and substitutions in genotype K, which were mainly identified in this study, suggesting more novel lineages might generate from genotype K in the future.
In contrast to the gradual substitution mode of the other three HCoVs, there was no obvious temporal evolutionary relation among genotype A, B, and C of HCoV-HKU1 circulating in human population. The different substitution mode of HCoV-HKU1 suggests that the three genotypes might have developed in animals before spilling over to humans. Therefore, we had to perform alignment analysis in each genotype, and found genotype B showed more substitutions than the other two (Figure 3D). Notably, the three genotype B of the four HCoV-HKU1 sequenced in this study had cumulative substitutions as did HCoV-229E (Supporting Information: Figure 7B).
4 DISCUSSION
In this study, we conducted a multicenter surveillance to elucidate the epidemiological and evolutionary characteristics of HCoVs in Beijing, a metropolis with a population over 25 million people, China. This study not only elaborates the epidemiological profile of acute respiratory illness caused by HCoVs, but also offers 179 whole genome sequences of HCoVs. These data have enriched viral information resources, and surely contribute to better understanding the evolutionary and epidemiological characteristics of HCoVs.
The HCoV infection rate among patients in this study is comparable to that reported in Hong Kong, China,30 and lower than that in the United States,31 Spain,32 Japan,33 Kenya,34 and Shanghai, China.35 The most commonly detected HCoV in this study is HCoV-OC43, followed by HCoV-NL63 and HCoV-229E. The proportion of different HCoVs is consistent with the results of previous studies,34, 35 but differs from that in Hong Kong, China where the proportion of HCoV-HKU1 is higher than that of HCoV-229E.30 There is no significant difference between males and females among the infections of four HCoVs. This finding differs from that in the United Kingdom, where HCoV-OC43 infection was more frequent in males than in females.36 The proportion of HCoV-229E infection is significantly higher in urban areas than in suburban areas. The higher population density in urban areas might contribute to the higher proportion of HCoV-229E infection. However, considering the patients with HCoV-229E infection are remarkably older than other HCoV infection groups, as described by a previous report,37 the rapid aging of population in urban areas is probably another reason for more HCoV-229E infection.
According to the case definition,20 if a patient with ILI has severe conditions requiring hospitalization, he or she is suffering from SARI. The difference in SARI proportion caused by the four HCoVs might be due to either characteristics of each HCoV or the analyzed samples in this study. The inherent contribution to SARI of different HCoVs needs further investigation in other populations. We further manage to identify the risk factors of SARI in reference to ILI using multivariate logistic regression analysis, and find that older age, male gender, living in suburbs, and coinfection with other pathogens are significantly associated with SARI. These findings are valuable for improving the clinical management of patients with severe diseases after HCoV infections.
Our phylogenetic analyses revealed that each of HCoVs had high diversity, and could co-exist at the same time but evolved in different patterns. Specifically, strains from different countries fell into the emerging lineage of HCoV-229E, all of which were detected during the period of 2016–2020, indicating the emerging lineage of HCoV-229E might have become currently dominant lineage in the world. The simultaneous cocirculation of three subgenotypes and the emerging lineage at the same area (Beijing) might represent the endemic trend of HCoV-NL63, and the underline transmission or evolutionary mechanisms deserve further investigation. The various evolutionary origins of the emerging lineage suggest that the great diversity of HCoV-OC43 might be owing to complex evolutionary pathways, and novel lineages could continuously emerge during its transmission among the human population. These findings suggest their specific evolutionary trajectories and gradual adaptations to humans during long-term co-circulation among the human population.
Up to now, no vaccine has been developed for HCoVs, indicating selection pressures cannot be artificially introduced to these viruses. Therefore, all the substitutions, insertions, and deletions should have naturally occurred and are least likely affected by outside factors, implying their persistent substitution and evolution. Whether these continuous substitutions would cause changes in transmissibility or pathogenicity deserves continuous surveillance. The genome regions related to immune recognition in S protein of HCoVs are under the highest selection pressure, indicating that the main evolutionary directions of these HCoVs are obviously associated with immune response and protection persistence.9, 12 Notably, the substitutions located at receptor binding loops, which might have an impact on receptor binding ability with human aminopeptidase N (hAPN).38 Overall, these findings suggest that complicated evolutions occur in the circulation of HCoVs among the human population, and subsequently lead to the emergence of novel lineages.
The study had some limitations. Although 179 complete genome sequences of HCoVs were obtained from 321 reverse transcription-polymerase chain reaction positive samples, it is a pity that we could not get the sequences from the remaining sample due to the low viral load and limited sequencing ability. Only one sample was collected from each patient at an early stage of infection, therefore the viral load might not be at a high level at the time of sample collection. Although the current study focused on epidemiological and evolutionary analyses of HCoVs, the detailed medical records were not available from the ILI patients, which has inhibited us from investigating possible factors in relation to virus diversity and disease severity.
In conclusion, coronaviruses remain the important threatening pathogens to human beings in the foreseeable future. Our findings together with previous reports demonstrate that novel lineages of HCoVs have been continuously emerging in decades. This study provides a reference basis for possible evolutionary trajectories of coronaviruses, and implies more variants might occur in the future.
AUTHOR CONTRIBUTIONS
Wu-Chun Cao, Fang Huang, and Na Jia conceived, designed, and supervised the study. Cheng Gong, Hui Xie, Zhen-Yong Ren, and Mao-Zhong Li collected the data. Cheng Gong, Xiao-Ming Cui, Hui Xie, Qian Wang, Ya-Wei Zhang, Luo-Yuan Xia, Ming-Zhu Zhang, Li-Feng Du, and Jie Zhang checked sample quality and prepared samples for sequencing. Run-Ze Ye, Hang Fan, Jin-Yue Liu, Lin Zhao, Ze-Hui Li, and Yu-Yu Li was responsible for data analysis. Run-Ze Ye, Jin-Yue Liu, Lin Zhao, Ze-Hui Li, Yu-Yu Li, Nuo Cheng, Wenqiang Shi, and Jia-Fu Jiang prepared the figures. Wu-Chun Cao, Fang Huang, Na Jia, Run-Ze Ye, Cheng Gong, Hang Fan, Xiao-Ming Cui, and Jin-Yue Liu performed data interpretation. Wu-Chun Cao, Run-Ze Ye, Fang Huang, Na Jia, and Xiao-Ming Cui wrote the manuscript.
ACKNOWLEDGMENTS
This study was supported by grants from the Natural Science Foundation of China (81621005), the State Key Research Development Program of China (2019YFC1200505 and 2021YFC2301302), the Capital's Funds for Health Improvement and Research (2021-1G-3015 and 2020-4-3014), and the National Major Science and Technology Project for Control and Prevention of Major Infectious Diseases of China (2017ZX10103004).
CONFLICT OF INTEREST STATEMENT
The authors declare no conflict of interest.
ETHICS STATEMENT
As a part of RPSS, the ethics approval for the protocol of this study was obtained from the Ethics Committee of Beijing CDC. Before enrollment, the purpose, procedures, potential health impacts, and benefits of this study were explained carefully to participants or their care providers, and written informed consents were obtained.
Open Research
DATA AVAILABILITY STATEMENT
All whole genomes obtained in this study have been deposited in GenBank under Accession numbers ON553961–ON554139.