Efforts to study the early stages of the coronavirus pandemic have received help from a surprising source. A biologist in the United States has ‘excavated’ partial SARS-CoV-2 genome sequences from the beginnings of the pandemic’s probable epicentre in Wuhan, China, that were deposited — but later removed — from a US government database.
The partial genome sequences address an evolutionary conundrum about the early genetic diversity of the coronavirus SARS-CoV-2, although scientists emphasize that they do not shed light on its origins. Nor is it fully clear why researchers at Wuhan University asked for the sequences to be removed from the Sequence Read Archive (SRA), a repository for raw sequencing data maintained by the National Center for Biotechnology Information (NCBI), part of the US National Institutes of Health (NIH).
“These sequences are informative, they’re not transformative,” says Jesse Bloom, a viral evolutionary geneticist at the Fred Hutchinson Cancer Research Center in Seattle, Washington, who describes in a 22 June preprint how he recovered the sequences.
Bloom discovered the sequences after searching for genomic data from the pandemic’s early stages. A research paper from May 2020 contained a table of publicly available sequence data, which included entries Bloom had not come across. The sequences were associated with a paper in which researchers used nanopore-sequencing technology to detect SARS-CoV-2 genetic material in samples from people. That study was published in the journal Small in June 2020, having been posted on bioRxiv in March of that year.