About the origin of the first two Sars-CoV-2 infections in Italy: Inference not supported by appropriate sequence analysis
In the February 5, 2020 issue of Journal of Medical Virology a paper was published by Giovannetti et al1, entitled “The first two cases of 2019-nCoV in Italy: where they come from?.”
In this paper, a phylogenetic and evolutionary analysis was applied to the virus identified in the first two subjects diagnosed in Italy with 2019-nCoV infection, recently renamed SARS-CoV-2,2 two Chinese spouses arrived in Italy for tourism. The diagnosis was performed by the virology team under the direction of Maria R. Capobianchi, at the National Institute of Infectious Diseases (INMI) in Rome, Italy, where the patients are currently hospitalized. Partial sequencing of M gene (322 bp, 26690-27012 in ref WH-01 MN908947) was performed for confirmatory purpose, indicating, in this region, the identity between the viral strains harbored by these patients and the prototype strain from China. These sequences were timely shared through both GenBank (accession number: MT008022 and MT008023) and GISAID3 (EPI_ISL_406959 and EPI_ISL_406960). At the time of publication of the paper by Giovannetti et al, there were no other publicly available sequences from these two patients, while the only full genome sequence just submitted to GenBank (accession number: MT066156), is still not publicly available. So, although in the paper by Giovannetti et al the sequences are not identified with their GenBank accession number or GISAID ID, one must infer that the sequences from Italy included in the analysis are the publicly available 322 bp sequences obtained in our laboratory.
Based on a molecular clock Bayesian phylogenetic approach, the authors of this paper used these short sequences, in the context of 52 genome sequences from other sources, to infer the “origin” of the strain infecting the two cases diagnosed by our group, dating it back to January 19, 2020. It is to be underlined that this short sequence shows very limited variability. In fact, according to the data available in GISAID by the February 5, 2020 (publication date of the paper by Giovannetti et al) the sequence of this region was identical in all SARS-CoV-2 sequenced strains, with only a few substitutions in BetaCoV/USA/IL1/2020_EPI_ISL_404253, BetaCoV/USA/CA1/2020_EPI_ISL_406034 and BetaCoV/Shenzhen/SZTH-001/2020_ EPI_ISL_406592. Consistent with the high level of conservation of 2019-nCoV genomes4, 5 reliable conclusions on phylogenetic evolutionary reconstruction can only be based on sufficiently long sequences. However, we could not reproduce the results of this study using the recently established full genome sequence of the first isolate from Italy, since all the sequences included in the Bayesian phylogenetic analysis lack the accession number, as it is the case of the short sequences from Italian cases obtained in our laboratory and shared through GenBank/GISAID. Therefore, according to the limited informative content of the short sequence of the isolates from Italy included in the analysis, the conclusion of this study on the origin of these viral strains should be considered misleading and not scientifically supported.
CONFLICT OF INTERESTS
The authors declare that there are no conflict of interests.
AUTHOR CONTRIBUTIONS
MRC, FC, and EL had the idea, while MRC, FM, and GI contributed to the writing of the commentary. FM provided statistical support. All authors reviewed and approved the final version.