Gerstner Bioinformatics and Computational Biology Scholar Profiles
Current Gerstner Bioinformatics and Computational Biology Scholar Profiles
Dr. Audrey Lin
Research Focus — My research utilizes recovering ancient biomolecules from archaeological and museum specimens and objects to answer diverse questions on human-mediated evolutionary processes, including domestication, extinction, and mechanisms of zoonoses. My multidisciplinary approach integrates tools and theories from the life sciences (palaeogenomics, biology, zoology), the humanities (history), and social sciences (archaeology and cultural anthropology).
My primary research focus for the Gerstner Scholarship involves utilizing novel museomics methods to explore the genetic roots of virulence in historical viral diseases. Studying ancient viral genomes offers valuable insights into the dynamics of present-day emerging and re-emerging infectious diseases. It is rare to observe adaptive mutations over the course of an infectious disease outbreak, but the study of ancient viral genomes can allow us to observe ‘evolution in action’, by reconstructing the evolutionary history of how and why a pathogen's virulence may change over time.
Bats are the natural reservoir hosts for many coronaviruses (CoVs), and Rhinolophus bats in particular have been identified as the most likely origin of SARS-CoV-2, based on genetic sequences of bat CoVs collected through wildlife surveillance. However, there are an unknown number of extinct viral lineages that may have been the progenitors of SARS-CoV-2 and other CoVs – including those yet to emerge. I will use preexisting viral sequences and information on co-roosting of different bat species to examine the network of co-evolutionary relationships among viruses and their bat hosts. I am developing methods to recover viral RNA from “wet” ethanol-preserved bats housed in the American Museum of Natural History. I will use bioinformatic methods to develop and optimize a pipeline to study low-coverage virus genomes from museum specimens and perform phylogenomic and phylodynamic analyses. These viral ‘time capsules’ can reveal lost viral diversity and evolution of these current and future health threats.
Biography —Audrey T. Lin received her B.S. in Biological Sciences (2012) from University of Nevada Las Vegas and M.Sc. in Infection and Immunity (2014) from University College London. She received her D.Phil from the Department of Zoology (2020) where she used bioinformatic methods to investigate the consequences of genome evolution and its impact on molecular rate, demographic spread, and its physiological and functional changes over time. While at Oxford, Audrey joined the Palaeogenomics & Bio-Archaeology Research Network (PalaeoBARN) under the supervision of Professor Greger Larson where she made her research home in the field of palaeogenomics. Prior to joining AMNH, Audrey was at the Smithsonian National Museum of Natural History’s Anthropology Department where she held the Peter Buck Postdoctoral Fellowship and the George Burch Postdoctoral Fellowship in Theoretical Medicine and Affiliated Theoretical Science. While at NMNH, she led several collections-based projects, including two fully-funded projects with the aim of recovering viral RNA from museum specimens to better understand the molecular evolution of zoonotic viruses of biomedical interest — particularly the 1918 pandemic influenza virus and bat coronaviruses. The last project she led was in collaboration with Coast Salish weavers, Knowledge Keepers, Elders, and artists, utilizing “Two-Eyed Seeing” to view the world through the combined strengths of Indigenous knowledge and western science. This holistic approach revealed the deep history and recent decline of the Coast Salish woolly dog due to colonialism.
Dr. Jose Barba-Montoya
Research Focus — I study patterns of molecular evolution and species diversification across the tree of life by integrating phylogenetics/phylogenomics with systematics and ecology. My primary focus lies in inferring evolutionary relationships and divergence times of species and subspecies. The inference of accurate timetrees is crucial for understanding major transitions in the evolution of genomes and species. Despite the abundance of data and powerful analysis methods, challenges persist in constructing reliable timetrees. Understanding the different sources of errors and uncertainties, as well as the strategies to address them, is fundamental for constructing an accurate tree of life. Given these considerations, another aspect of my research is dedicated to developing methods for phylogenomic analysis and molecular clock dating that effectively control sources of error and uncertainty.
Arachnid phylogeny has emerged as one of the most challenging problems in phylogenomics due to an ancient rapid radiation. Recent phylogenomic studies have both confirmed and refuted the monophyly of Arachnida with implications for understanding the evolution of terrestrialization: Did arachnids colonize land once or on multiple occasions? As a Gerstner Scholar in Bioinformatics and Computational Biology at the American Museum of Natural History I am investigating arachnid monophyly and the evolution of terrestrialization, using phylogenomic data for all living orders of Chelicerata; (2) developing a bioinformatics pipeline for identifying specific genomic regions or loci that affect phylogenetic signal, causing incongruence among gene trees and species trees; (3) undertaking further phylogenomic and divergence dating analyses of specific arachnid orders to build a more complete dated tree of life for Arachnida; and (4) developing an algorithm for automatically updating dated phylogenies with new sequences. I expect that the methods implemented will achieve robust estimates of relationships and divergence times for Arachnida.
Biography — Jose Barba-Montoya received his BSc in Biology (2009) and MSc in Systematics (2012) from the Universidad Nacional Autónoma de México under the supervision of Professor Susana Magallón. His dissertations research focused on the evolution of columnar cacti and pollination syndromes. Then, he received his PhD in Molecular Evolution (2017) from University College London under the supervision of Professor Ziheng Yang. His dissertation explored the use of phylogenomic data, combined with statistical summaries of the fossil record, to estimate divergence times of angiosperms and primates. Specifically, the impact of different interpretations of the fossil record and relaxed molecular clock models, among other factors, on Bayesian divergence time estimation was explored. This work highlighted that the main factors impacting time estimates are data partitioning, fossil calibration uncertainty, discrepancy between the specified time prior and the effective time prior, and relaxed clock model. Prior to joining AMNH, Dr. Barba-Montoya was a postdoctoral fellow with Professor Sudhir Kumar at Temple University. His research mainly focused on investigating and developing models and methods for phylogenetic inference and divergence time estimation. Overall, his work had applied implications for molecular clock dating and provided valuable insights into the evolution of angiosperms and primates.
Dr. Daniel Hooper
Research Focus -Gene flow, following hybridization, can play a large and often counterintuitive role in a variety of evolutionary processes related to trait evolution and the development of reproductive isolation. My research at the AMNH utilizes a comparative genomic approach to investigate topics such as: (a) How chromosome inversions evolve and their effectiveness as barriers to gene flow. (b) The evolutionary forces that shape sex chromosome evolution and their large contribution to speciation. (c) How mitonuclear coevolution within species can lead to mitonuclear incompatibilities between species. Towards these aims, I am currently evaluating the potential of a new approach to recover population-level haplotype information for non-model organisms called ‘haplotagging’. This new, rapid-run, low-cost, high-throughput approach maintains haplotype information by generating linked-read sequence data from large DNA molecules that are uniquely barcoded with custom beadTags. I am currently evaluating the utility of this new linked-read data towards better understanding inversions, sex chromosomes, and mitonuclear coevolution in the long-tailed finch hybrid system.
Biography – Daniel M. Hooper received his B.S. from the University of California Davis in 2009 and his Ph.D. from the University of Chicago in 2017 with Dr. Trevor Price. His dissertation examined chromosome inversion evolution in birds, explicitly testing support for the hypothesis that inversions – as potent recombination modifiers – are contextually selectively favored in certain conditions: i.e., maladaptive hybridization between divergent species or eco-types. Evaluating decades of cytological work on passerines in a phylogenetic context and using comparative genomic techniques to examine an Australian hybrid zone, this work suggested that inversions may often contribute to speciation in birds. His postdoctoral work has pursued these findings.
Alumni Gerstner Bioinformatics and Computational Biology Scholar Profiles
Dr. Kevin Deitz
Research Focus - The availability of thousands of individual genomes of multiple arthropod species presents an unparalleled opportunity to understand the impact of natural selection on genome evolution. My research addresses the following basic questions in genome biology, evolution, and speciation: What biological processes result in the evolution of genetic incompatibilities and reproductive isolation between incipient species? How is evolution facilitated, or constrained, by the available genetic variation within a population? How does selection act to maintain or remove exogenous elements that integrate into genomes? As a Gerstner Scholar in Bioinformatics and Computational Biology in the Institute of Comparative Genomics at AMNH, I am using open-source computational biology tools and developing new bioinformatic pipelines to address these questions in the context of rapid species radiations (Anopheles gambiae species complex) and global invasions (Aedes aegypti) of mosquitoes that are the primary vectors of some of the most devastating human diseases. I am investigating how heterogeneity in endogenous viral element integrations among arthropod populations and species impacts their resistance to viral infection, with an emphasis on arbovirus vectors such as the yellow fever mosquito Aedes aegypti. I also have ongoing research focused on understanding how recombination shapes nucleotide variation in arthropod genomes and impacts the rate of adaptive evolution.
Biography – Kevin C. Deitz received his BS in Conservation Biology from SUNY-ESF in 2008, and his MS (2011) and PhD (2017) in Entomology from Texas A&M University under the supervision of Michel Slotman. His thesis and dissertation research focused on the population, evolutionary, and speciation genetics of the Anopheles gambiae complex of African malaria mosquitoes. This work had applied implications for vector control but also provided basic insights into how genetic incompatibilities evolve and contribute to rapid species radiations. Prior to joining AMNH, Dr. Deitz was a postdoc with Peter Andolfatto at Princeton and Columbia Universities where his research focused on understanding the evolution of epistatic mutations in non-model Drosophila. Additionally, by leveraging the full genome sequences of hundreds of flies from across the Drosophila yakuba species clade, he analyzed thousands of orthologous genes in order to understand how variation in the effective population sizes of these species impacts the efficacy of selection in their genomes. This analysis helps us to understand how genetic variation within populations and species impacts their ability to adapt to their environment.
Dr. Victor Sojo
Research Focus – I am a trans-disciplinary evolutionary biologist with a background in (bio)chemistry and computer science. My main research interests are in the “major evolutionary transitions”—the times at which the greatest biological innovations occurred in the history of life on Earth. I focus on the roles that membranes (both the lipids and the proteins) played in some of these transitions.
I work mostly on early evolution, including the emergence of life itself, the origin of free-living cells, and the evolution of complex cells (eukaryotes), but I am generally interested in the evolution of membranes throughout life’s entire history.
Why membranes? Because membranes are crucial to almost all of life’s key processes, including respiration, cell-to-cell communication, reproduction, photosynthesis, vision, motion, feeding, parasitism/immunity, energetics, and more. Membrane-associated proteins amount to approximately one third of all proteins in the genome, and they compose over three quarters of the membranes themselves. Membranes are also the point of contact between the cell and the environment. As such, they are the direct target of over half of all known drugs, whereas the remaining drugs typically have to get through the membranes before they can exert their functions inside the cell. Understanding membranes can therefore have significant implications far beyond theoretical evolutionary biology, into physiology and medicine.
My main project at the AMNH is on the origin and early evolution of eukaryotes from an “endosymbiotic” merger of bacteria and archaea. I am studying the bioenergetic role of mitochondrial respiratory membranes in the early events of eukaryogenesis. As part of my research, I am performing analyses of horizontal gene transfers across the three domains of the Tree of Life. Both topics—the relations between the branches of the tree of life, and the complexity of gene transfers between them—are central to my research in general.
I use mostly computational methods (bioinformatics and modelling) and theoretical analysis in my research, but I can also occasionally—albeit rarely—be found in the lab.
Biography – Victor Sojo received a research undergraduate degree in Chemistry from the Universidad Central de Venezuela (UCV) in Caracas, followed by a Master’s in Computer Science, also at UCV. He then pursued a second Master’s, in Modelling Biological Complexity, from the University of London – University College London (UCL). Victor completed his PhD in Evolutionary Biology in 2016, also at UCL, with Nick Lane and Andrew Pomiankowski. Prior to becoming a Gerstner Scholar at the AMNH, Victor was a Life Sciences Fellow at the Institute for Advanced Study in Berlin, a European Molecular Biology Organization (EMBO) Postdoctoral Fellow at the Ludwig-Maximilian University of Munich, and a Short-Term Postdoctoral Fellow at RIKEN in Tokyo.
Dr. Marcelo Gehara
Research Focus – Evolution happens in space and time and understanding the influence of these two variables in diversification processes is a major goal in evolutionary biology. Time is routinely modeled in population studies, but space is often implicitly considered. My goal as a postdoctoral researcher at the AMNH is to develop a simulation-based method to test the influence of space in the genetic differentiation of populations. I am currently working on a spatially explicit population genetic model that uses different distance measures, such as linear spatial distance or environmental resistance distances, to simulate gene flow between populations. Under this model, known spatial patterns of Isolation by distance and Isolation by resistance can be replicated in-silico. With this tool at hand we will be able to train machine-learning algorithms to recognize these spatial influences in real datasets.
Biography – Marcelo Gehara received his BA in biology by the Universidade Federal de Juiz de Fora - UFJF, Brazil in 2005. He obtained a Masters degree in Zoology by Pontifícia Universidade Católica do Rio Grande do Sul - PUCRS, Brazil (2009) under the supervision of Sandro L. Bonatto, and a PhD in Zoology by the Technical University of Braunschweig, Germany (2013) under the supervision of Miguel Vences. Marcelo has authored several publications on phylogeography of amphibians and reptiles from different parts of the world. Recently he has been focusing his research in developing new methodologies for the study of evolution at the population level. Prior to being a postdoc at the RGGS, Marcelo was a postdoc in Frank Burbrink’s group, where he developed PipeMaster, an r-package for simulation-based inference in population genetics.
Dr. Chase Nelson
Research Focus — Evolutionary genetics has long been plagued by a paucity of data. A great body of mathematical theory in the form of population genetics was developed well before data became available to test most of its predictions. Now, with the maturation of next-generation sequencing (NGS) technologies over the past decade, many previously intractable questions are within reach. In particular, the sequencing of large numbers of individuals or else pooled samples consisting of many individuals (pooled NGS) allow allele frequencies to be measured with remarkable resolution, and important population genetic parameters to be estimated. My own research aims to develop computational tools (e.g., SNPGenie) for automating such analyses, allowing researchers to draw evolutionary inferences from NGS variant (primarily single nucleotide polymorphism, or SNP) data. So far, my efforts have mainly involved data from within-host virus populations, including both natural isolates (e.g., arteriviruses infecting red colobus monkeys) and serial infection experiments (e.g., influenza viruses infecting ferrets).
My research at the American Museum of Natural History focuses mainly on (1) within- and between-host viral evolution, especially of human papillomaviruses (HPVs); (2) determining orthology within a parsimony framework; and (3) the patterns of variation among and geographic distribution of human immune alleles. I rely heavily on my Institute for Comparative Genomics mentor, Apurva Narechania, in addition to generous collaborators at the National Cancer Institute and elsewhere. Our multidisciplinary work involves the collection of samples from natural populations around the world, genome sequencing, variant calling, bioinformatic processing, and evolutionary analysis. With respect to viral evolution, specific questions to be addressed include the host and viral genetic determinants of carcinogenicity in HPV, the influence of host vs. pathogen co-evolutionary history, and the importance of Muller’s Ratchet-like processes for viral fitness. Work on orthology aims to update and expand the BigPlant OrthologID tool to incorporate more eukaryotic genomes and non-protein-coding DNA information.
Finally, human immune allele research aims to detect correlations between spatial and genetic distance, as well as to identify specific variants important in evolutionary history and immunology. It is hoped that this work will elucidate methods for drawing upon quantitative theory to make concrete predictions that will have relevance for questions both therapeutic (e.g., immune epitope identification for vaccine design) and theoretical (e.g., the importance of mutation accumulation and the relative contributions of selection and drift in molecular evolution).
Biography — Chase W. Nelson received his B.A. in Biology from Oberlin College in 2010, where he performed honors research on mutation accumulation and gene expression in Arabidopsis thaliana under Angela J. Roles. During this time, he also undertook research experiences in unsupervised motif discovery at Ohio University and the molecular biology and phylogenetics of maize at the University of Wyoming. While he completed his Ph.D. in Biological Sciences at the University of South Carolina, studying evolutionary bioinformatics under Austin L. Hughes from 2011 to 2016, he also participated in next-generation sequencing research under Wen-Hsiung Li at Academia Sinica (中央研究院) in Taipei, Taiwan. In his free time, Nelson pursues vocal and dance studies, writing, and learning Mandarin.
Dr. Matthew Aardema
Research Focus – My research addresses three key evolutionary and ecological questions: 1) do larger effective population sizes improve the effectiveness of selection, 2) can ecological specialization promote larger effective population sizes, and 3) does improved selection efficacy increase the rate of adaptive evolution? The answers to these questions will produce a better understanding of the factors that facilitate adaptation and will also shed light on population divergence and the formation of new species. I predominately work in two systems, butterflies of the genus Papilio and the pathogenic, tick-vectored bacterium, Anaplasma phagocytophilum. Within Papilio, I am investigating the factors that differ between a host-generalist species, Papilio glaucus (the Eastern Tiger Swallowtail), and a host-specialist species, Papilio troilus (the Spicebush Swallowtail). These species are similar in most life history characteristics and geographic ranges, yet differ greatly in the number of larval host species they can feed upon. Within A. phagocytophilum, I have discovered that different strains vary greatly in the number of mammalian hosts they infect (host-range). I am now investigating what consequences this difference has for the evolutionary trajectory of this bacterium. In both systems, I utilized next-generation sequencing technologies and high performance computing to carry out comparative genomic analyses. These analyses allow me to assess sizes, gene flow and selection within populations and species. Other computational projects I am currently working on include thermal adaptation in clown fish using, population structure in South Africa elephants and population diversity in a rodent malaria species.
Biography - Matthew Aardema completed his B.S. in Zoology at Michigan State University (MSU) in 2008. He then went on to earn a M.S. degree at MSU in Ecology, Evolutionary Biology and Behavior. From there he went to Princeton University to earn his Ph.D. in the Department of Ecology and Evolutionary Biology under the guidance of Prof. Peter Andolfatto. Upon completion of his PhD, Matthew joined the American Museum of Natural History as a Gerstner Scholar in Bioinformatics and Computational Biology
Website - https://www.montclair.edu/profilepages/view_profile.php?username=aardemam
Dr. Robert Harbert
Research Focus - Understanding what drives the distribution of species is fundamental to the study of ecology, taxonomy, and systematics. The physiology of a given plant species restricts its distribution to a specific set of climate, soils, and biotic interactions.
I am broadly interested in elucidating spatial patterns in global plant biodiversity through computational methods, particularly across temporal and climatic gradients. As a result, I work with large online databases of aggregated primary biodiversity data like the Global Biodiversity Information Facility (GBIF) to develop quantitative methods for the characterization of geographic distributions in terms of environmental factors, mainly climate. The unifying goal of my research is to be able to apply standard, quantitative methods to primary biodiversity data to reliably characterize environmental niches in a likelihood modeling framework for all extant taxa.
The main product of my Ph.D. work was a likelihood modeling framework that estimates climate parameters given a local vegetation community composition and distribution data for all of those species. The Climate Reconstruction Analysis using Coexistence Likelihood Estimation (CRACLE) framework has been extensively tested on modern vegetation surveys. Relative to other related methods, CRACLE is distinctly Gleasonian in its view of vegetation assembly. That is, all species are treated as independent units in the model when parameterizing climate niche occupancy. Now, CRACLE is being turned towards paleoclimate modeling using Late Quaternary plant macrofossil communities.
Biography - Rob completed his B.S. in Biology at Roanoke College in 2011. From there, he went on to Cornell University to work on Ph.D. in the School of Integrated Plant Science and the Section of Plant Biology with a focus on Plant Systematics and Evolution with Dr. Kevin C. Nixon. Upon completion of his Ph.D. in July 2016 he joined the American Museum of Natural History as a Gerstner Scholar in Bioinformatics and Computational Biology.
Website - https://www.stonehill.edu/directory/robert-s-harbert/
Dr. Martine Zilversmit
Research Focus – The goal of my research is to broaden our understanding of how genomes change over time by developing tools and models to study the genomes of non-model organism systems and overlooked genome regions. The focus of my work is genome structure evolution, particularly regions where it has been historically difficult to map variation or even to assemble the DNA sequence, including: gene families, low-complexity sequences, subtelomeric regions, and small regions of extreme sequence divergence. Although this “dark matter” of genome structure is often the most difficult to work with, particularly in non-model organisms, it can yield the most interesting results relevant to phenotypic change and understanding evolutionary rates. Towards this end, I have been using malaria parasites as my study system because of the large number of open questions about multiple aspects of structural change in their relatively large, but low-complexity, genomes.
My recent work in comparative genomics is grounded in my earlier work in systematics and evolutionary genetics from my master’s research at the American Museum of Natural History and my PhD at Harvard University. The techniques of sequence alignment, tree building, and statistical models of evolution remain at the core of all my research. I have been able to use these methods to examine the evolution of low-complexity regions in coding sequence (DePristo and Zilversmit et al., 2006; Zilversmit et al. 2010), gene family evolution (Bethke et al. 2006; Ferreira, Zilversmit, and Wunderlich 2007; Zilversmit et al. 2013), and allelic and non-allelic homologous recombination (Zilversmit et al. 2010; Zilversmit et al. 2013). Analyzing all these phenomena both within and between species, in a phylogenetic framework for comparisons in the context of ancestry and evolutionary relationships, has allowed me to explore the neutral versus adaptive aspects of their evolution.
A large part of my research program now focuses on developing tools and technologies that allow for advances in comparative and evolutionary genomics. Recent developments in innovative and lower-cost sequencing technologies, and the wider use of parallel computer clusters, mean that scientists longer need to rely on the big sequencing centers for whole-genome level data, and from just a few model organisms. Since we can sequence, assemble, and annotate our own genomes, this can vastly increase the diversity of available genome data for comparative studies, which are at the core of evolutionary genomics. Currently, I am adapting both molecular and computational genomic methods to work with non-model organism systems to examine genome diversity and divergence using paired-end short read and long-read data. For this work, I have designed interworking genome analysis pipelines, using UNIX scripting and Python programs, for genome mapping (resequencing) with annotations, de novo genome assembly and annotation, and high-accuracy variant calling and in silico validation. Included in the mapping pipeline is a set of novel programs, one of which identifies regions of rapid evolutionary change that are not detectable by standard methods, and two others that show intersection and complement of variant data from multiple individuals. I have applied these techniques successfully to inter- and intraspecific studies, to identify loci associated with the evolution of drug resistance and increased disease virulence. In addition, I have been able to construct fine-scale genome maps of markers for population-level studies and quantitative trait locus analysis.
I am now applying these methods and pipelines to broader questions in malaria parasite evolution, including the origin of hemoglobin metabolism and thus the creation of the doorway to colonizing red blood cells as their ecological setting. My research also focuses on understanding other rapidly evolving gene families such as those that code for venom proteins.
Biography -Martine Zilversmit received her MS in Biology from NYU working with Rob DeSalle on high-throughput genome sequencing techniques for studies in molecular evolution and her PhD in Biology from Harvard University in the Department of Organismic and Evolutionary Biology working with Daniel Hartl on the genome structure evolution and population genetics of malaria parasites.