Systematic review and meta-analyses of superspreading of SARS-CoV-2 infections
Zhanwei Du, Chunyu Wang and Caifen Liu contributed equally to this manuscript.
Abstract
Superspreading, or overdispersion in transmission, is a feature of SARS-CoV-2 transmission which results in surging epidemics and large clusters of infection. The dispersion parameter is a statistical parameter used to characterize and quantify heterogeneity. In the context of measuring transmissibility, it is analogous to measures of superspreading potential among populations by assuming that collective offspring distribution follows a negative-binomial distribution. We conducted a systematic review and meta-analysis on globally reported dispersion parameters of SARS-CoV-2 infection. All searches were carried out on 10 September 2021 in PubMed for articles published from 1 January 2020 to 10 September 2021. Multiple estimates of the dispersion parameter have been published for 17 studies, which could be related to where and when the data were obtained, in 8 countries (e.g. China, the United States, India, Indonesia, Israel, Japan, New Zealand and Singapore). High heterogeneity was reported among the included studies. The mean estimates of dispersion parameters range from 0.06 to 2.97 over eight countries, the pooled estimate was 0.55 (95% CI: 0.30, 0.79), with changing means over countries and decreasing slightly with the increasing reproduction number. The expected proportion of cases accounting for 80% of all transmissions is 19% (95% CrI: 7, 34) globally. The study location and method were found to be important drivers for diversity in estimates of dispersion parameters. While under high potential of superspreading, larger outbreaks could still occur with the import of the COVID-19 virus by traveling even when an epidemic seems to be under control.
1 INTRODUCTION
A novel coronavirus (SARS-CoV-2) was first identified in Wuhan, China, in early 2020 and rapidly spread throughout the world. The World Health Organization (WHO) declared a pandemic on 11 March 2020 (Wan, 2020). As of 2 July 2022, over 548 million confirmed COVID-19 cases and 6.34 million deaths have been reported (World Health Organization, 2022a). Worldwide, five variants of concern (VOC, e.g. Alpha, Beta, Gamma, Delta and Omicron) and eight variants of interest (VOI, e.g. Epsilon, Zeta, Eta, Theta, Iota, Kappa, Lambda and Mu) have already been identified by WHO to-date (World Health Organization, 2022b). Some of these variants have exhibited increased transmissibility and severity compared to wild-type SARS-CoV-2 virus, with some also able to partially evade immunity conferred by prior infection or vaccination (Garcia-Beltran et al., 2021).
The dispersion parameter (k) is a statistical parameter used to characterize and quantify heterogeneity in certain distributions. In the context of measuring transmissibility, overdispersion in transmission has often been estimated by assuming that the collective offspring distribution follows a negative-binomial distribution (Lloyd-Smith et al., 2005; Su et al., 2020). Specifically, the variance of the number of secondary infections from each case is , where R is the mean and k is the dispersion parameter. A small value of k indicates increased heterogeneity in transmission and therefore a high potential of superspreading and describes the phenomenon that a few infectious cases account for most secondary transmissions (Gao et al., 2019). Accurate estimates of k are essential for determining the potential need for, and intensity of, public health and social measures (PHSMs) needed for disease control. When superspreading potential is low, relaxing PHSMs to reopen societies become feasible in low transmission scenarios (). While under high potential of superspreading, larger outbreaks could still occur even when an epidemic seems to be under control ().
For SARS and MERS, most infections are caused by a small proportion of cases, with the dispersion parameter ranging from 0.06 to 2.94 (Wang et al., 2021). However, a comprehensive review and comparison of the superspreading potential of COVID-19 and its uncertainty over countries is still lacking. We carried out a systematic review and meta-analysis of published estimates of the dispersion parameter, aiming to estimate the pooled k of SARS-CoV-2 infections.
2 MATERIALS AND METHODS
2.1 Search strategy and selection criteria
All searches were carried out on 10 September 2021 in PubMed for articles published from 1 January 2020 to 10 September 2021. We included all relevant articles that were published in peer reviewed journals, coupled with 8 articles recommended by experts. Search terms for superspreading for COVID-19 variants included (#1) ‘COVID-19’ OR ‘SARS-COV-2’ OR ‘2019-nCov’ OR ‘Coronavirus 2019’ OR ‘2019 coronavirus’ OR ‘coronavirus Wuhan’ OR ‘pneumonia Wuhan’ and (#2) ‘Superspreader’ OR ‘Spreader’ OR ‘Superspreader event’ OR ‘Super-spreader’ OR ‘Super-spreader hosts’ OR ‘Super-spreading’ OR ‘Superspreading’ OR ‘Overdispersion’ OR ‘Dispersion parameter’ OR ‘20/80 rule’ OR ‘dispersion parameter’ and the final search term was #1 AND #2. After reading the abstract and full text, we included studies in which estimates of the dispersion parameter were reported along with their uncertainty intervals and estimation periods. We excluded other systematic reviews and meta-analysis from our analyses but included relevant studies mentioned in these reviews. Finally, 144 studies are included with the publish date between 20 March 2020 and 3 September 2021.
2.2 Data extraction
All data were extracted independently and entered in a standardized form by 2 co-authors (CW and CL). Conflicts over inclusion of the studies and retrieving the estimates of these variables were resolved by another co-author (ZD). Information was extracted on the estimates of dispersion parameters of COVID-19 superspreading coupled with the corresponding 95% or 90% confidence interval (CI) or the 95% credible interval (CrI) or 95% range across 500 instances of reconstructed transmission tree (95% Range). This paper converts 90% CI to 95% CI for meta-analysis. Other information such as study's information (i.e., estimation period and location), model used in estimation measurements of transmissibility and heterogeneity (i.e., dispersion parameter, ‘20/80’ rule and dispersion parameter), and study population and settings (i.e., type of cases) was also extracted for each selected study (see Supplementary Materials for details).
2.3 Estimation of dispersion parameter in studies reporting the ‘20/80’ rule
2.4 Statistical analysis
We use the I2 index to assess heterogeneity between studies into the following three categories: I2< 25% (low heterogeneity), I2= 25%–75% (average heterogeneity) and I2> 75% (high heterogeneity). Because of the high I2 value that was calculated in our results, as well as the significance of the Cochran Q test, a random-effects model was further used to perform a meta-analysis in this study. Finally, meta-regression analysis using a mixed-effects model was conducted to quantify the association between study's location and the estimate of dispersion parameter. Analyses were conducted in R version 4.1.1.
3 RESULTS
We identified 114 studies published from 1 January 2020 to 10 September 2021 by searching PubMed and additionally included 8 studies from our own reference list. Of these, 59 studies were excluded through title and abstract screening, leaving 55 studies for full-text assessment. A total of 17 of them were finally included in this study, providing 45 estimates. The detailed selection process is illustrated in Figure 1. The reports are conducted based on data in eight countries (e.g. China, the United States, India, Indonesia, Israel, Japan, New Zealand and Singapore) using three methods (e.g. negative binomial distribution, zero-truncated negative binomial distribution and phylodynamic analysis) (Table 1).

Study | Method | Dispersion parameter (k), (95% CI) | Period | Region |
---|---|---|---|---|
Sun et al. (2021) | Negative binomial distribution | 0.30 (0.23, 0.39) | 2020-1-16 to 2020-4-3 | Mainland China |
Adam et al. (2020) | Negative binomial distribution | 0.43 (0.29, 0.67) | 2020-1-23 to 2020-4-28 | Hong Kong, China |
Bi et al. (2020) | Negative binomial distribution | 0.58 (0.35,1.18) | 2020-1-14 to 2020-2-12 | Mainland China |
He et al. (2020) | Negative binomial distribution | 0.70 (0.59, 0.98) | 2020-1-15 to 2020-2-29 | Mainland China |
Hasan et al. (2020) | Negative binomial distribution | 0.06 (0.05, 0.07) | 2020-3-2 to 2020-3-31 | Indonesia |
Hasan et al. (2020) | Negative binomial distribution | 0.20 (0.09, 0.31) | 2020-3-19 to 2020-4-7 | Indonesia |
Kwok et al. (2020) | Negative binomial distribution | 2.30 (0.02, 4.58) | By 2020-3-3 | Hong Kong, China |
Kwok et al. (2020) | Negative binomial distribution | 0.51 (0.21, 1.59) | By 2020-3-3 | Japan |
Kwok et al. (2020) | Negative binomial distribution | 1.78 (0.09, 3.47) | By 2020-3-3 | Singapore |
Lau et al. (2020) | Negative binomial distribution | 0.63 (0.54, 0.85) | 2020-3-1 to 2020-4-3 | USA |
Lau et al. (2020) | Negative binomial distribution | 0.66 (0.60, 0.71) | 2020-3-1 to 2020-4-3 | USA |
Lau et al. (2020) | Negative binomial distribution | 0.62 (0.54, 0.75) | 2020-3-1 to 2020-4-3 | USA |
Lau et al. (2020) | Negative binomial distribution | 0.64 (0.53, 0.75) | 2020-3-1 to 2020-4-3 | USA |
Lau et al. (2020) | Negative binomial distribution | 0.39 (0.37, 0.44) | 2020-3-1 to 2020-4-3 | USA |
Miller et al. (2020) | Phylodynamic analysis | 2.97 (2.86, 3.08) | By 2020-4-22 | Israel |
Tariq et al. (2020) | Negative binomial distribution | 0.11 (0.05, 0.25) | 2020-1-23 to 2020-3-17 | Singapore |
Wang et al. (2020) | Phylodynamic analysis | 0.23 (0.13, 0.38) | 2019-12-24 to 2020-2-14 | Mainland China |
Zhao et al. (2021) | Negative binomial distribution (zero-truncated framework) | 0.37 (0.29, 0.48) | 2020-1-15 to 2020-2-29 | Mainland China |
Zhao et al. (2021) | Negative binomial distribution (zero-truncated framework) | 0.32 (0.15, 0.64) | 2020-1-23 to 2020-4-28 | Hong Kong, China |
Zhao et al. (2021) | Negative binomial distribution (zero-truncated framework) | 0.18 (0.01, 1.79) | 2020-1-21 to 2020-2-26 | Mainland China |
Zhang et al. (2020) | Negative binomial distribution | 0.25 (0.13, 0.88) | 2020-1-21 to 2020-2-26 | Mainland China |
Shi et al. (2021) | Negative binomial distribution | 0.21 (0.13, 0.33) | 2020-1-21 to 2020-4-10 | Mainland China |
James et al. (2021) | Negative binomial distribution | 0.29 (0.10, 2.05) | 2020-3-25 to 2020-4-22 | New Zealand |
Kremer et al. (2021) | Negative binomial distribution | 0.43 (0.38, 0.49) | 2020-1-23 to 2020-4-18 | Hong Kong, China |
Kremer et al. (2021) | Negative binomial distribution | 0.50 (0.50, 0.51) | By 2020-8-1 | India |
Kremer et al. (2021) | Negative binomial distribution | 0.56 (0.29, 0.83) | By 2020-12-31 | Rwanda |
Endo et al. (2020) | Negative binomial distribution | 0.10 (0.05, 0.20) | By 2020-2-27 | Global |
Riou and Althaus (2020) | Negative binomial distribution | 0.54 (0.01, 8.18) | By 2020-1-18 | Global |
High heterogeneity was reported among the included studies (I2= 100% and p < .0001). The mean estimates of dispersion parameter (k) range from 0.06 to 2.97 over eight countries. The pooled estimate of k was 0.55 (95% CI: 0.30, 0.79), with changing means over countries (Figure 2) and decreasing slightly with the increasing reproduction number (Figure 3). The global estimates are 0.54 (95% CI: 0.54, 8.18) in January 2020 (Riou & Althaus, 2020) and 0.10 (95% CI: 0.05, 0.20) in February 2020 (Endo et al., 2020). The expected proportion of cases accounting for 80% infections is 19% (95% CrI: 7, 34) over countries (Table 1).


The meta-regression analysis was conducted based on the reported k estimates, which allowed us to explore the potential association between the study attribute (e.g. location, methods, or age groups) and the estimated dispersion parameter (Figure 4). We found that the study location was closely associated with the reported dispersion parameter in the meta-analysis by including country, age group or method as a categorical variable (p < .0001).

4 DISCUSSION
For SARS-CoV-1, SARS-CoV-2 and MERS-CoV, most infections are caused by a small proportion of people. During the 2003 SARS epidemic, 76 infections arose from 1 hospitalized patient in Beijing, China (Shen et al., 2004). And during the 2015 MERS outbreak, 5 patients led to 154 secondary infections in South Korea (Chun, 2016). In this early COVID-19 outbreak, around 10% of cases in countries outside China accounted for 80% of secondary cases (Endo et al., 2020). But epidemiological population-level measures (e.g. the basic reproduction number) usually hide immense variation at the individual level (Du, Hong et al., 2022; Du, Javan et al., 2020; Du, Liu et al., 2022; Du, Tian et al., 2022; Du, Xu et al., 2020). We thus carried out a systematic review and meta-analysis of 17 studies on the dispersion parameter to characterize COVID-19 superspreading.
Estimation of the dispersion parameter from individual case data requires accurate observation of transmission chains, usually collected through contact-tracing or phylodynamic analysis, and can be biased, perhaps by reporting bias, estimation methods and transmission scenarios. The negative binomial model with the zero-truncated framework would reduce the estimation bias of dispersion parameter when the under-ascertainment of index cases with zero secondary case occurs, for example, in China (Zhao et al., 2021). Estimating and monitoring changes in the dispersion parameter are thus critical for determining the type and stringency of public health and social measures (PHSMs) needed to reduce the occurrence of superspreading events, although we found that the estimate for the variant Delta or even any other variant is not yet available. Japan recognized the importance of superspreading in February 2020, implemented the cluster-focused backwards contact tracing and promoted awareness of people at risk of infection by closing higher risk locations, followed by the World Health Organization's Western Pacific Region in July 2020 to limit the number of people to gather indoors thus to curb the spread of the virus. And restaurants were estimated to account for 20% of transmissions if all businesses were to reopen in 2020 in the United States (Chang et al., 2021). Such measures can mitigate the impact of superspreading events, which are expected to be major drivers in early epidemics.
In the recent systematic review of COVID-19 superspreading by 10 February 2021, the estimates of dispersion parameters for COVID-19 range from 0.01 in the United States to 5 in Israel (Wang et al., 2021). We include most of their studies together with those published by 10 September 2021, and re-estimate those based on some simple assumptions to conduct the pooled estimates and the meta-analysis. The major difference is the lower dispersion parameter, which is estimated to be 0.01 in the United States in the published review (Wang et al., 2021). In contrast, we directly extract the estimates from figures in the raw study, which range from 0.39 to 0.66 before the shelter-in-place order, resulting in the lower limit changing to 0.06 as that in Indonesia (Table 1). Finally, the pooled estimates from our analysis indicated that the dispersion parameter of COVID-19 was likely to be 0.55 (95% CI: 0.30, 0.79), approximate to that of India, China and the United States (Figure 2).
The estimate of dispersion parameters in Israel is 2.97 (2.86, 3.08), as the highest among the 8 study countries, which may be attributable to strict PHSMs and border control strategies before the first local case (Wang et al., 2021). These control measures would prevent substantial imported cases, which typically triggered superspreading events (Adam et al., 2020; Wang et al., 2021).
Our study has several limitations. Most articles included in our study used publicly available data. Some studies in our review might have used overlapping data, leading to double counting in the pooled estimates. And with the recent emergence of variants that may be more transmissible and evade immunity acquired through prior infection or vaccination, the future of the pandemic is highly uncertain. Meanwhile, SARS-CoV-2 viruses are constantly evolving through mutation; genetic variations have emerged and circulated over the world, which may modify individual infectiousness profiles. We are still not clear about the impact of variants on overdispersion, perhaps by increasing transmissibility. Our pooled estimate is based on the previous transmission of wild-type in early 2020, which may not be generalizable to the dominant variant Delta and future studies will be needed to conduct the comparison. Our searches were carried out on 10 September 2021 in PubMed. Many studies have been published later. For example, Akhmetzhanov et al. (2021) estimated the dispersion parameter for the variant Epsilon in Taiwan during January and February 2021.
In conclusion, multiple estimates of the dispersion parameter have been published for 17 studies, which could be related to where and when the data were obtained. The study location and method were found to be important drivers for diversity in estimates of dispersion parameters.
ACKNOWLEDGEMENTS
We acknowledge the financial support from the AIR@InnoHK Programme from Innovation and Technology Commission of the Government of the Hong Kong Special Administrative Region, the Collaborative Research Fund (Project No. C7123-20G) of the Research Grants Council of the Hong Kong SAR Government, Seed Fund for Basic Research for New Staff of the University of Hong Kong (grant no. 202009185062), National Natural Science Foundation of China (grant no. 72104208), and Health and Medical Research Fund, Food and Health Bureau, Government of the Hong Kong Special Administrative Region (grant no. 21200632), Natural Science Foundation of Jilin Provincial Science and Technology Department (grant no. 20180101332JC) and the Science and Technology Project of the Jilin Provincial Education Department (grant no. JJKH20210135KJ). The funders of the study had no role in study design, data collection, data analysis, data interpretation or writing of the report.
CONFLICT OF INTEREST
BJC reports honoraria from AstraZeneca, Sanofi Pasteur, GSK, Moderna and Roche. The authors report no other potential conflict of interest.
AUTHOR CONTRIBUTIONS
ZW, CW, CL and BJC: conceived the study, designed statistical and modelling methods, conducted analyses, interpreted results, wrote and revised the manuscript; YB, SP, DA, LW, PW and EL: interpreted results and revised the manuscript.
CODE AVAILABILITY
Code used for data analysis is freely available upon request.
ETHICS STATEMENT
No ethical approval was required as this is a review study with no original research data.
Open Research
DATA AVAILABILITY STATEMENT
All data are collected from open source with detailed description in Supplementary Method.