Taking inspiration from ecological studies, a team of researchers led by the Museum and Columbia University has developed new software that can signal the emergence of COVID-19 variants. The new surveillance tool, which tracks information diversity across genomes, could be applied more broadly for other emerging viruses.
“I don’t think I would have had this idea if I didn’t work at the Museum,” said Apurva Narechania, a senior bioinformaticist in the Museum’s Institute for Comparative Genomics and lead author of the study describing the technique, which is published this week in the journal Genome Research.
The original intent, Narechania says, was to reimagine how we understand diversity in microbial species, which can be difficult to map onto an evolutionary tree because they undergo such rapid, horizontal evolution. Viruses have similarly dizzying rates of evolution: in SARS-CoV-2, the virus that causes COVID-19, rapid adaptations can lead to new strains that are more contagious or more severe than the original.
During the height of the pandemic, public health policy depended on researchers to track these changes by sequencing millions of genomes. But this surveillance was usually a few weeks to months behind the edge of the pandemic curve.
“Speed is key to responding to these evolving strains, but the traditional way of analyzing these sequences slows down surveillance techniques,” Narechania said.
With the onset of the COVID-19 pandemic, and working closely with colleagues in public health, Narechania turned his focus from bacteria to viruses, specifically, SARS-CoV-2.
The team looked to a long-standing metric for comparing species diversity across environments called a Hill number, or the effective number of species in a sample. The higher a Hill number, the more diverse the sample. The new pandemic surveillance software adopts this ecological approach, but in the place of species, the researchers use strings of sequence information, and in the place of environments, they use genomes.
The software was tested on SARS-CoV-2 sequence data, finding that it accurately predicts variant emergence before the onset of sickness in the population.
“In a crisis of COVID-19’s scale and speed, eliminating the analysis lag can mean the difference between timely, reasonable public health response and failure to understand and anticipate the disease’s next turn,” said Barun Mathema, a professor of epidemiology at Columbia University’s Mailman School of Public Health and a corresponding author on the study.
The software is now on GitHub and freely available to non-commercial entities. Although it cannot characterize new variants, it can forewarn public health officials when a new strain is on the horizon. The researchers point to the software’s ability to detect new variants in wastewater as a particularly impactful potential application.
“We show that tracing a pandemic curve with these new metrics enables the use of sequence data as a real-time sensor, tracking both the emergence of variants over time and the extent of their spread,” Mathema said. “Our technique affords public health institutions the opportunity to create actionable policy based on a simple, quantitative measure.”