Microbial 'omics


Table of Contents

Study hard what interests you the most, in the most undisciplined, irreverent, and original manner possible.
Richard Feynman


The vast majority of life on our planet is microbial. An astonishing number of microbial organisms living in terrestrial and marine habitats represent a biomass that exceeds every living organism that can be seen by naked eye, combined.

The inconceivable diversity of microbes allow them to synthesize or break down a wide array of chemical substrates, and govern biogeochemical cycles that make Earth a habitable planet for much less talented organisms (such as ourselves). Our own body is also home to a diverse assemblage of microbial cells. Bacteria that colonize our gastrointestinal tract help us by extracting energy from undigested carbohydrates, synthesizing vitamins, and metabolizing xenobiotics. Microbes are essential to Earth’s functioning at every scale, and understanding them is imperative for a complete understanding of life.

In our lab we combine our expertise and interest in microbiology and computation to investigate the diversity and functioning of naturally occurring microbial communities in environments ranging from the human gastrointestinal tract and oral cavity, to sewages, oceans, and soils. We develop software platforms and algorithms to make sense of high-throughput sequencing data for marker genes, metagenomes, and metatranscriptomes that open windows to microbial life styles.

We believe in interdisciplinary, hypothesis-driven, open, and collaborative science, and we strive to identify the best computational and experimental practices to investigate mechanisms by which microbes interact with their surroundings, evolve, disperse, and initiate and/or adapt to environmental change.

The following is an incomplete list of ongoing projects we currently lead or participate through collaborations.

Holistic approaches to investigate complex gastrointestinal diseases

Inflammatory bowel diseases (IBD) describe a number of prolonged inflammatory conditions of the human colon and small intestine that affect increasingly more number of people. Substantial evidence links the occurrence of these chronic, relapsing conditions to aberrant immune responses to microbes that colonize the gut. Although there are remarkable shifts in microbial communities with the presence of IBD, there is no evidence for microbial members or metabolisms that are specific to the guts of individuals who suffer from IBD, and that are absent from healthy guts. The current understanding of mechanisms that lead to the development of these conditions is unfortunately limited. However, there is room for improvement in ways these diseases are studied. In collaboration with Gene Chang’s group at the University of Chicago, Mitch Sogin’s and Hilary Morrison’s groups at the Marine Biological Laboratory, and Dionysios Antonopoulos’ group at the Argonne National Laboratory, we study the microbial members of IBD patients and the evolution of these microorganisms through genome-resolved metagenomics using longitudinal sampling strategies that allow individuals to serve as their own controls. Our holistic approach includes combining shotgun metagenomics with cultivation or organisms of interest, associating our findings with host factors, and generating hypotheses to be tested in model systems.

Genome-resolved understanding of the human oral cavity

The oral cavity represents the receiving end of our digestive tract where the processing of food begins. Like every other mucosal surface on our body, bacteria also colonize the oral cavity, and they play a critical role in health and disease states of the mouth. Some medical conditions in the oral cavity, such as tooth decay, gum diseases, root canal infections, and tonsillitis can result in systemic diseases. Hence, a complete microbial understanding of this environment has always been essential for medical reasons. Besides its immediate relevance for overall health, we believe the oral cavity represents a fascinating environment to study the ecology of microbes. Due to the lack of any physical barriers, and continuous flow of saliva, microbes can disperse everywhere in the mouth. However, their distribution is far from random. While the microbial occupants of niches in the human mouth (such as tongue, cheeks, gums) form distinct communities, a universe of interactions emerges in this relatively small environment. We previously showed the differential distribution of very closely related microbial organisms in different oral sites at the marker gene-level using our high-resolution computational approaches. Today, in collaboration with Jessica Mark Welch from the Marine Biological Laboratory, we go further in an attempt to develop a genome-resolved understanding of the oral cavity, and characterize the distribution of pangenomic traits across oral sites.

Studying public health through the guts of the urban ecosystem

Proper removal of waste is one of the most basic requirements of settled human communities. In fact, we owe our ability to live in such small geographical areas with such high population densities primarily to sewer infrastructures and their ability to effectively evacuate and treat human waste. Today, a modern sewer infrastructure is a critical component of every city. With its ubiquitous parts and components (i.e., toilets, pipes, drains, manholes, pumping stations, and treatment centers), this built environment represents a new, and not yet well-characterized ecosystem for microbial life. Although the functioning of microbes in wastewater treatment efforts has been extensively studied due to their industrial applications for bioremediation, understanding the microbial life in the rest of the sewer infrastructure, especially the pipe systems, has not been a major area of interest. In a recent study led by Ryan Newton and Sandra McLellan from the University of Wisconsin-Milwaukee, we demonstrated that it is possible to predict the level of obesity in a given US city with more than 80% accuracy by only analyzing the microbial community signatures found in sewage samples. The link between the microbial community structure and the level of obesity as demonstrated by this finding suggests a potentially very important role for sewages to track public health. We pursue a deeper understanding of the ecology of the sewer ecosystems through marker genes and shotgun metagenomes in an attempt to develop baseline metrics for microbial signatures that can identify matters of public health, and environmental change.

Niche boundaries and genomic heterogeneity of marine microbial clouds

Marine microbes underpin large food webs in the ocean by utilizing a wide range of energy sources to create biomass for other organisms to consume. They are also responsible for the cycling and bioavailability of crucial elements, including carbon, phosphorus, and nitrogen, among many others. This unseen majority of life in the oceans contributes about half of the oxygen in the atmosphere, and represents more than 98% of all biomass in the oceans and seas. Despite their great importance, our understanding of the diversity, functioning, and evolution of marine microbial life is far from being complete. This is largely due to our inability to bring microbial life into the lab environment for comprehensive analyses: in most cases it is very challenging to successfully isolate individual microbes from their interactive and complex environment, and keep them functionally alive. To bypass this limitation we are using state-of-the-art molecular and computational approaches to recover the genomic content of eukaryotic, bacterial and archaeal microbial organisms directly from the environment using shotgun metagenomes. We use shotgun metagenomic data also to determine the niche boundaries and genomic heterogeneity of SAR11, the most abundant group of marine microbes. In collaboration with Stephen Giovannoni from the Oregon State University, our efforts mostly focus on identifying widespread SAR11 clouds, characterizing their heterogeneity across geographic regions, and investigating the links between natural selection and the genomic heterogeneity we observe in these clouds.

Advanced software platforms for high-resolution microbial ‘omics

Computation today is at the core of every scientific discipline. Indeed, the fields of microbiology and microbial ecology, which rely on big data more and more, have dramatically benefited from the advances in computation during the last decade. The importance of computation in life sciences puts our lab in a lucky situation. That said, we try to use our skills in computation wisely. How much of the scientific questions we dare to ask depend on the availability of computational solutions that can facilitate the investigation of those questions? Although the inherent link between the tool and thinking will continue to bind them together, we believe it must be mostly the intellectual curiosity what drives the direction of science, and not the comfort of what is available. The agreement we have with ourselves in our lab is to keep the biology as the sole inspiration of our direction, and never let the computational conveniences find their ways into our thinking in the expense of our ability to explore fundamental questions. We strive to create software that would allow users to get their hands dirty with their data, without imposing boilerplate analysis practices. In most cases the questions we are interested in require precise computational approaches which can offer enough resolution that would enable us to detect subtle changes. These needs resulted in various software solutions we proposed, including oligotyping, and minimum entropy decomposition for the analysis of marker gene data, and anvi’o for comparative genomics, metagenomics, metatranscriptomics, and visualization of complex data. We intend to maintain our flexibility, and let the incoming questions shape and re-shape our software.

Our 2 cents

Some vocabulary we try to use, and promote as much as we can:

  • Bacteria and archaea” to describe the two major domains of life, instead ofprokaryotes”. Because major scientific advances should not be ignored (here is an opinion piece from Norman Pace: “It’s time to retire the prokaryote”).

  • Single-nucleotide variant” (SNV) to describe nucleotide positions with variation that emerges from the mapping of short reads from environmental shotgun metagenomes to a genomic context, instead ofsingle-nucleotide polymorphism” (SNP). SNP has a very specific definition, which makes it inappropriate to use in the context of metagenomic mapping.

  • Microbial clouds” to describe an assemblage of co-existing microbial genomes in an environment that are similar enough to map to the context of the same reference genome, instead ofmicrobial populations”. Because the definition of the term population makes it irrelevant to microbiology.

  • Metagenome-assembled genomes” (MAGs) to describe bins of assembled contigs from shotgun metagenomic data, instead ofdraft genomes”. Because the term draft genome have long been used to describe not-yet-finalized genomes acquired from the whole genome sequencing and assembly of cultured microbial isolates.

  • Marker gene amplicons” to describe the high-throughput sequencing data of marker genes targeted by (universal) primers, instead ofmetagenomes”. Metagenomics is broadly defined as the study of genetic material directly recovered from a sample. With some effort, this broad definition could include marker gene amplicon surveys, but it really should not. The authors who coined the term associate the ‘metagenome’ with having access to the collective genomes of microbes in an environment. Today the term metagenomics is mostly used to describe shotgun sequencing of the environmental DNA in order to explore the functional potential or community composition of a given community at the level of metagenomic short reads (without assembly), assembled contiguous DNA segments (after assembly), or metagenome-assembled genomes (after assembly and binning). Marker gene surveys, including the ones that amplify hyper-variable regions of the ribosomal RNA genes, does not fit into what metagenomics describes, and the misuse of the term creates a lot of confusion.

  • kbp”, “Mbp”, or “Gbp” to communicate the number of base pairs in a contig, or a genome, instead ofKB”, “MB”, or “GB”. Because the latter are commonly used to describe the amount of digital information in ‘bytes’, and the alternative use is not appropriate. Although it appears in the literature quite often, the use of “Kb”, “Mb”, or “Gb” are not very suitable either, since these units are commonly used to quantify digital information in ‘bits’.

If you believe we need corrections, please don’t hesitate to write to us.