What is the origin of the SARS-CoV-2 coronavirus and the COVID-19 epidemic?
Text updated on 2020-12-22
The SARS-CoV-2 virus is related to the bat coronaviruses of southern China. We still don't know how the first humans were infected. The question of the origin of SARS-CoV-2 is not well studied by researchers, while the associated scientific and political issues are important.
Numerous scenarios have been proposed regarding the origin of COVID-19, from the SARS-CoV-2 virus, since its detection in Wuhan in December 2019.
Where does SARS-CoV-2 come from?
- The SARS-CoV-2 coronavirus is a member of the betacoronavirus family. It is related to coronaviruses that naturally infect bats.
- The virus closest to SARS-CoV-2 is RaTG13 (96% sequence identity, about 1,100 mutations scattered among the 30,000 letters of the total genome, meaning they are a few decades apart). It was collected in 2013 from a horseshoe bat (Rhinolophus affinus) in an old abandoned mine in Yunnan (1,900 km from Wuhan) where, in 2012, several people had suffered severe pneumonia after cleaning up bat droppings.
- RaTG13 is not the ancestor of SARS-CoV-2 but a cousin. So there is another bat virus from which RaTG13 and SARS-CoV-2 are derived. We will call it the ancestor virus, and its origin is probably southern China.
How did SARS-CoV-2 form from the ancestor virus?
- SARS-CoV-2, RaTG13, and other bat viruses have close sequences, suggesting a relationship.
- SARS-CoV-2 has evolved from the ancestor virus and contains short sequences that allow it to bind to and infect human cells.
- SARS-CoV-2 does not contain any fragments of sequences previously published in public coronavirus databases. It can, therefore, be deduced that SARS-CoV-2 is not a virus constructed in the laboratory by assembling known sequences.
- Small pieces of coronavirus RNA are identical to sequences present in the HIV-1 virus genome. Phylogenetic studies indicate that these are insertions that occurred at different times in the evolutionary history of the virus and they are so small that this similarity between the genomes of the two viruses is most likely coincidental. There is no current data showing that SARS-CoV-2 results from recombinations with HIV-1.
How did the epidemic start?
What we know
- The sequences of the SARS-CoV-2 coronavirus from the beginning of the epidemic are extremely close to each other (just a few nucleotide differences) whereas today's sequences are much more varied. The entire epidemic has therefore developed from an initial infectious event, which can be dated between October and early December 2019. This means that the first COVID-19 patients from Wuhan that were swabbed from this outbreak. But the first human (patient zero) has not been identified and may have been contaminated before. The virus may have been circulating silently in the population before triggering the first detected outbreak.
- The South China seafood market in Wuhan probably played a role in the spread of the outbreak, but it is not where the first human was infected because the patients who first showed symptoms in early December, 2019 were not connected to this market.
What we don't know
- Where and when the first human (patient zero) was infected.
- If there was an intermediary between bats and man... For the other two coronaviruses that have caused human epidemics, an intermediate host has been identified: camels for MERS-CoV and civets for SARS-CoV. These hosts constituted a reservoir allowing the virus to evolve rapidly and acquire its infectious properties for humans. The hypothesis of pangolins as an intermediate for SARS-CoV-2 is not supported by convincing data.
- If the virus first circulated silently in a human population before acquiring its infectious properties.
Thus, as of December, 2020, key elements are still missing to understand the origin of the epidemic. Several scenarios are possible:
- The first is the infection of a human by a bat and then spread within the human population.
- The second is that an animal served as an intermediate host between bats and man and allowed the virus to acquire its highly infectious character. This scenario is plausible because it has already been observed in SARS and MERS epidemics. However, it is not demonstrated in the case of SARS-CoV-2 because the intermediate host has not yet been identified.
- the third hypothesis is that SARS-CoV-2 could have been stored in a laboratory and then accidentally released. This scenario has fuelled conspiracy theories about the origin of the epidemic.
At this time, none of these 3 scenarios can be excluded. In order to understand which scenario is the right one, it would be necessary to identify viruses very similar to SARS-CoV-2 either in an intermediate host or in patient samples taken before the start of the epidemic.
What are the elements that have led to certain conspiracy theories?
- The Wuhan Institute of Virology is home to the team of Zheng-Li Shi, a specialist of bat coronaviruses, who published the first genome of SARS-CoV-2 and that of RaTG13. It is located 15 km from the seafood market, where the epidemic began to spread. One of the buildings of the Wuhan Centre for Disease Control, which also conducts research on coronaviruses, is 300 m from the seafood market. Given the possibility of a laboratory accident, Zheng-Li Shi was quickly questioned about the possible origin of SARS-CoV-2 and the research carried out in his laboratory. She then passed on her information related to the origin of the virus sparingly, in July 2020 and again in November 2020. The deletion of a database on bat viruses that she had published in 2019, as well as existing conflicts of interest within the commissions (Lancet and WHO) set up to investigate the origin of SARS-CoV-2 contribute to fuel suspicions.
- Luc Montagnier, winner of the Nobel Prize for his work on the HIV virus, said the virus was created from scratch. This statement followed an unpublished and erroneous article that was quickly retracted.
In February 2020, the Chinese government tightened the safety regulations for its biological research laboratories. Research into viruses with pandemic potential is important, but it has a double-edged sword: it can improve our knowledge but also increase the risk of a pandemic if laboratory accidents occur.
In conclusion, we do not know how the epidemic started. Whatever the origin of SARS-CoV-2, this pandemic has shown us how fragile the human population is in the face of new contagious diseases.
On February 3, 2020, Zheng-Li Shi's team published the first sequence of SARS-CoV-2 and RaTG13, a bat virus 96.2% identical, which corresponds to about 1,100 mutations scattered among the 30,000 letters of the total genome.Zhou, P., Yang, X. L., Wang, X. G., Hu, B., Zhang, L., Zhang, W., ... & Shi, Z.-L.. (2020). A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature, 579(7798), 270-273.
Zheng-Li Shi's answer to the questions of the journal Science, obtained after 3 months, and published on July 24, 2020. No mention of the mine and pneumonia in 2012. Traces of SARS-CoV-2 were detected in environmental samples from doorknobs, soil, and sewage from the Wuhan Seafood Market, but no traces of SARS-CoV-2 were detected in samples of frozen animals.Coen, J. (2020). Trump "owes us an apology". Chinese scientist at the center of COVID-19 origin theories speaks out. Science.
Addendum to Nature's February 2020 article, published in November 2020, explaining for the first time the link between RaTG13 and a mine in which several people contracted severe pneumonia in 2012.Zhou, P., Yang, X. L., Shi Z.-L. (2020). Addendum: A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature, 588, E6.
Analysis of sequences from SARS-CoV-2 and related viruses to reconstruct their evolutionary history. The article rules out deliberate modification of the virus because no portions of sequences unique to SARS-CoV-2 have been identified in the databases. Two hypotheses are considered: the contamination of a human by a virus that then mutated to SARS-CoV-2 or evolution in an animal reservoir and then human contamination. They interpret the existence of common sequences between pangolin viruses and SARS-CoV-2 as indicative of natural mutations. The possibility of laboratory manipulation followed by accidental leakage is ruled out on the basis that the laboratory in question never mentioned having isolated a virus extremely close or even identical to SARS-CoV-2.Andersen, K. G., Rambaut, A., Lipkin, W. I., Holmes, E. C., & Garry, R. F. (2020). The proximal origin of SARS-CoV-2. Nature medicine, 26(4), 450-452.
A preprint - an article that has not yet been validated by the scientific community or published - contained erroneous information about the link between the HIV and SARS-CoV-2 sequences. The article was quickly retracted.Pradhan, P., Pandey, A. K., Mishra, A., Gupta, P., Tripathi, P. K., Menon, M. B., ... Kundu, B. (2020). Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag. BioRxiv, 2020.01.30.927871. COPYRIGHT
Analysis of sequences from SARS-CoV-2 and related viruses to reconstruct their evolutionary history. Current data do not allow us to state with certainty that SARS-CoV-2 is the result of a zoonotic emergence (from animal to human) or an accidental escape of a laboratory strain. The hypothesis of pangolin as an intermediate host is not clear from the current data. The similarities between SARS-CoV-2 and the HIV-1 virus are small and therefore may have occurred by chance.Sallard, E., Halloy, J., Casane, D., Decroly, E., & van Helden, J. (2020). Tracing the origins of SARS-COV-2 in coronavirus phylogenies. arXiv preprint arXiv:2011.12567.
The website https://nextstrain.org analyses the published footage from SARS-CoV-2 and regularly publishes reports for the public. The April 10, 2020 report indicates that the common ancestor of the circulating viruses emerged in Wuhan, China in late November or early December 2019.Bell, S.M., Müller, N, Wagner, C., Hodcroft, E., Hadfield J., Neher, R. Bedford, T. (2020). Genomic analysis of COVID-19 spread. Nextstrain. Situation report 2020-04-10.
Of the first 41 COVID-19 patients hospitalized in Wuhan, 14 had no connection with the Wuhan seafood market (including the oldest case). NB: the WHO report of November, 2020 actually mentions 124 patients COVID-19 in Wuhan at the end of December 2019. https://sciencespeaksblog.org/2020/11/28/five-questions-on-new-data-from-china-who-showing-124-confirmed-coronavirus-patients-in-december-2019/Huang, C., Wang, Y., Li, X., Ren, L., Zhao, J., Hu, Y., ... & Cheng, Z. (2020). Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The lancet, 395(10223), 497-506.
The SARS-CoV and MERS-CoV viruses are derived from bat coronaviruses, and infections in early humans were caused by market civets and camels, respectively.Cui, J., Li, F., & Shi, Z. L. (2019). Origin and evolution of pathogenic coronaviruses. Nature Reviews Microbiology, 17(3), 181-192.
Description of the research project led by Peter Daszak and funded by the NIH that began in July, 2019. The third part ("Aim 3") plans to test different coronavirus sequences to measure their infectivity.NIH Project 2R01AI110964-06: UNDERSTANDING THE RISK OF BAT CORONAVIRUS EMERGENCE.
An international team of 10 experts was appointed by WHO to research the origin of SARS-CoV-2. This team includes Peter Daszak, who is funding Zheng-Li Shi's team at the Wuhan Institute of Virology through the EcoHealth Alliance, which is a significant conflict of interest.McCarthy, S. (2020) WHO names line-up for international team looking into coronavirus origins. 25 Nov 2020. South China Morning Post. Last accessed 7 Dec 2020.
The commission set up by The Lancet to examine the origin of SARS-CoV-2 is led by Peter Daszak, who funds the Wuhan Institute of Virology through the EcoHealth Alliance.Nuki, P., Nemey, S. (2020). Scientists to examine possibility Covid leaked from lab as part of investigation into virus origins. The Telegraph. 15 Sept 2020.
EcoHealth Alliance is a non-governmental organization that aims to protect people, animals, and the environment from emerging infectious diseases. It is led by Peter Daszac. It has received $3 million in funding from the U.S. NIH for 5 years (2015-2019). Part of this money was sent to Zheng-Li Shi's team for field research and analysis of bat coronaviruses. According to Peter Daszac, "The China bat research project was funded entirely through the NIH grant".Aizenman, N. (2020). Why The U.S. Government Stopped Funding A Research Project On Bats And Coronaviruses. Npr. 29 April 2020. Last accessed 7 Dec 2020.
Internet archive of an article by Zheng-Li Shi's team published in 2019 describing a database containing more than 22,000 virus sequences. The article and the corresponding database disappeared from the web page of the scientific journal in 2020 and are no longer accessible as of December 10, 2020. The other articles published by the journal (with the doi - digital object identifier - previous and following) are available on the internet.Yijie, T., Bei, L., Zijian, Z., Yan, Y., Kai, Z., Lili, M., Yuewei, W., Shi, ZL. (2019). Bat and rodent-borne viral pathogen database. vol 4 (4). doi: 10.11922/csdata.2019.0018.zh