Public Data and Reproducible Bioinformatics
Table of Contents
- Ecology and evolution of a cryptic plasmid
- Plasmid systems and their ecology
- Metabolic independence of the gut microbiome
- Structure-informed microbial population genetics
- Studying hypervariability in genomes through metagenomes
- High molecular weight DNA extraction protocols
- Metabolic competency of gut microbes in health and disease
- Niche partitioning in the human oral cavity
- Accurate and complete genomes from metagenomes
- Population Genetics of SAR11 through SAAVs
- Refining Metagenome-Assembled Genomes
- A multi’omics Spiroplasma analysis
- A Wolbachia Plasmid
- The Prochlorococcus Metapangenome
- Genomes from Tara Oceans Metagenomes
- Genome-resolved Fecal Microbiota Transplantation
- Bacteroides in Pouchitis
- Tardigrade Assembly Re-analysis
- Anvi’o Methods Paper
This page serves public (and often reproducible) data items that underlie key findings in our publications.
If you would like to interactively explore any of the data items, you may need to install anvi’o or use it via our Docker containers without installation. See anvi’o programs and artifacts that can be used to further explore reproducible data items below.
Please do not hesitate to get in touch if something is missing. You can send us an e-mail, or find us on Discord:
Bonus blog post: Fantastic Data and How to Share Them: A Plea to Journal Editors and Reviewers.
Ecology and evolution of a cryptic plasmid
- Here is a Twitter thread that explains some of the interesting aspects of pBI143 ecology as well as the practical implications having a human gut-specific and highly conserved genetic entity, copy-number of which responds to stress.
- You can find the plasmid sequences to look for pBI143 in your metagenomes, reproducible data items to re-investigate metagenomic read recruitment results, and bioinformatics workflows to elucidate population genetics of pBI143 here.
Reproducible bioinformatics workflow
https://merenlab.org/data/pBI143/ gives access to reproducible bioinformatics analyses that underlie our key findings.
Plasmid systems and their ecology
- Uses state-of-the-art machine learning strategies to identify over 60,000 plasmids from human gut metagenomes, which represents a 200-fold increase in the number of known plasmids to date that were detectable in healthy humans.
- Defines hundreds of 'plasmid systems', and demonstrates that naturally occurring plasmids are not static entities, but their evolution is driven by the need to respond to the environment, and their ecology cannot be simply explained by bacterial taxonomy and distribution patterns of their putative hosts.
- Here is a Twitter thread that goes through some of the interesting aspects if this work.
Raw and/or reproducible data items
- doi:10.5281/zenodo.5732024 gives access to reference plasmids and chromosomes used for training as well as the annotations of genes that are identified in them.
- doi:10.5281/zenodo.5843600 gives access to the PlasX model, data for known as well as predicted plasmids, predicted plasmid sequences, plasmid systems, and other key intermediate data files such as plasmid pairwise alignments and MMSeqs2 clusters.
- doi:10.5281/zenodo.8175278 serves information on all metagenomes used in our study to predict plasmids, including ~36 million assembled contigs per metagenome as well as estimated abundances of indivudal bacterial taxa in these metagenomes.
- doi:10.5281/zenodo.5730987 serves every gene call in contigs from publicly available gut metagenomes we used. Annotations of these genes based on COGs, PFAM domains, and MMseqs clusters are also available at doi:10.5281/zenodo.5731658.
- We deposited the long- and short-read sequencing data from Bacteroides fragilis isolates under the NCBI BioProject PRJNA782184.
- The source code for PlasX, the machine learning algorithm to perform plasmid prediction is available through its GitHub repository here, and the source code for MobMess, the software tool for mobile element seqeuence dereplication is availble through its GitHub repository here.
Metabolic independence of the gut microbiome
- Demonstrates high metabolic independence of the gut microbiome in individuals with inflammatory bowel disease via an analysis of hundreds of publicly-available gut metagenomes and reference genomes.
- Presents a logistic regression classifier that differentiates between individuals with IBD and healthy people using metabolic pathways associated with high metabolic independence.
-Shows the relevance of high metabolic independence for microbial resilience in general gut stress via the application of the classifier to an antibiotic time-series.
Reproducible bioinformatics workflow
https://merenlab.org/data/ibd-gut-metabolism/ gives access to our reproducible workflow.
Structure-informed microbial population genetics
- Demonstrates a quantifiable link between (1) the magnitude of selective pressures over key metabolic genes (e.g., glutamine synthase of the central nitrogen metabolism), (2) the availability of key nutrients in the environment (e.g., nitrate), and (3) the maintenance of nonsynonymous variants near protein active sites.
- Shows that the interplay between selective pressures and protein structures also maintains synonymous variants -- revealing a quantifiable link between translational accuracy and fluctuating selective pressures.
- Comes with a reproducible bioinformatics workflow that offers detailed access to computational steps used in the study that spans from metagenomic read recruitment and profiling to the integration of environmental variants and predicted protein structures.
Reproducible bioinformatics workflow
https://merenlab.org/data/anvio-structure gives access to our bioinformatics workflow.
Studying hypervariability in genomes through metagenomes
- Demonstrates a workflow that gives access to the extent intra-population hypervariability of DGRs and their ecology through the analysis of metagenomes.
- A complete bioinformatics workflow that uses anvi'o and oligotyping to study DGR activity in metageomes is available here.
Reproducible bioinformatics workflow
https://merenlab.org/tutorials/dgrs-in-metagenomes/ gives access to our bioinformatics workflow.
High molecular weight DNA extraction protocols
- It turns out the protocol that works best for sequencing DNA from microbial isolates may not be the most effetive method for long-read sequencing of metagenomes ¯\_(ツ)_/¯
- A reproducible bioinformatics workflow is available here. Detailed lab protocols for HMW DNA extraction methods mentioned in the study are here.
Reproducible bioinformatics workflow
https://merenlab.org/data/hmw-dna-extraction-strategies/ gives access to our bioinformatics workflow.
Detailed lab protocol
https://merenlab.org/data/hmw-dna-extraction-strategies/protocol/ gives access to detailed lab protocols.
Raw and/or reproducible data items
- PRJNA703035: The raw sequencing data.
- doi:10.6084/m9.figshare.14141228: Data products that give reproducible access to anvi’o contigs databases for assembled long-read sequences.
- doi:10.6084/m9.figshare.14141414: Data products that give reproducible access to anvi’o contigs databases for unassembled long-read sequences.
- doi:10.6084/m9.figshare.14141819: Data products that give reproducible access to anvi’o contigs databases for assembled shotgun metagenomes.
- doi:10.6084/m9.figshare.14141918: Supplementary Tables and Figures
Metabolic competency of gut microbes in health and disease
- Includes an observation that links the presence of superior metabolic competence in bacterial populations to their expansion in IBD.
- Here is a Twitter thread by Andrea that explains the key points of the study, and here is another one by Meren that details what are the critical learnings from it.
Reproducible bioinformatics workflow
https://merenlab.org/data/fmt-gut-colonization gives access to our bioinformatics workflow.
Raw and/or reproducible data items
- We deposited the raw sequencing data for donor and recipient metagenomes under the NCBI BioProject PRJNA701961 (see Supplementary Table 1 for accession numbers and the details of each sample).
- doi:10.6084/m9.figshare.14331236: Data products that give reproducible access to donor and recipient metagenomes, metagenome-assembled genomes, and read recruitment results.
- doi:10.6084/m9.figshare.14225840: Data products that give reproducible access to genomes and their metabolic potential we have used to investigatie metabolic competence among microbes from healthy individuals and individuals with IBD.
- doi:10.6084/m9.figshare.14138405: Supplementary Tables.
Niche partitioning in the human oral cavity
- Demonstrates that TM7s split into tongue specialists and plaque specialists, and plaque TM7s group with environmental TM7s, leading to an hypothesis that the dental plaque may have served as a stepping stone for environmental microbes to adapt to host environments at least for some clades of microbes
- "Microbes in dental plaque look more like relatives in soil than those on the tongue", EurekAlert.
- Anonymous reviewer comments and responses.
Raw and/or reproducible data items
- We deposited the short-read sequencing data for amplicons and shotgun metagenomes as well as key MAGs that emerged from our study under the NCBI BioProject PRJNA625082. The Supplementary Table 1 of our study lists accession numbers for each sample individually.
- We deposited the long-read sequencing data in the NCBI’s SRA database for individuals C-F (SRR11547007), C-M (SRR11547005, SRR11547006), and L (SRR11547004).
- doi:10.6084/m9.figshare.12217799: FASTA files for co-assembled metagenomes per individial.
- doi:10.6084/m9.figshare.12217805: Anvi’o split profiles for each of the 790 MAGs.
- doi:10.6084/m9.figshare.12217811: The TM7 pangenome.
- doi:10.6084/m9.figshare.11634321: Supplementary Tables, Supplementary Figures, and the Supplementary Information file.
Accurate and complete genomes from metagenomes
- Case studies include a demonstration of how single-copy core genes can fail to predict the quality of metagenome-assembled genomes, and automated strategies that yield tens of thousands of metagenome-assembled genomes will include extensive contamination.
- Promotes approaches to reconstruct 'complete' genomes from metagenomes and the use of GC skew as a metric for checking genome correctness.
- Tutorial on scaffold extension and gap closing, reproducible workflow for binning and phylogenomics of a Parcubacterium genome from human blood metagenomes.
Reproducible bioinformatics workflow
http://merenlab.org/data/parcubacterium-in-hbcfdna/ gives access to the bioinformatics workflow that details the reconstruction of a CPR genome from human blood cell-free DNA metagenomes, and it phylogenomic placement.
https://ggkbase-help.berkeley.edu/genome_curation/scaffold-extension-and-gap-closing/ gives access to a step-by-step procedure for scaffold extension and gap closing to dramatically improve the quality of metagenome-assembled genomes (MAGs).
Population Genetics of SAR11 through SAAVs
- A first attempt to link population genetics and the predicted protein structures to explore in silico the intersection beetween protein biochemistry and evolutionary processes acting on an environmental microbe.
- An application of metapangenomics to define subclades of SAR11 based on gene content and ecology.
- Reproducible bioinformatics workflow is here. Reviewer criticism and our responses are also available.
Reproducible bioinformatics workflow
http://merenlab.org/data/sar11-saavs gives access to our complete bioinformatics workflow.
Raw and/or reproducible data items
- doi:10.5281/zenodo.835218 Reproducible data items that quantify read recruitment of SAR11 isolates across surface ocean metagenomes.
- doi:10.6084/m9.figshare.5248453: Static HTML output that shows the distribution of SAR11 isolates across surface ocean metagenomes.
- doi:10.6084/m9.figshare.5248459: The reproducible SAR11 metapangenome.
- doi:10.6084/m9.figshare.5248435: Self-contained anvi’o profile for HIMB83 across all metagenomes.
- doi:10.6084/m9.figshare.5248447: Single-nucleotide variants of HIMB83 across metagenomes.
- doi:10.5281/zenodo.835218: SAR11 single-amino acid variants on predicted protein structures.
Refining Metagenome-Assembled Genomes
- A reproducible workflow to detail the steps of genome refinement and make available the refined versions of some key genomes.
Reproducible bioinformatics workflow
http://merenlab.org/data/refining-espinoza-mags gives access to our complete bioinformatics workflow.
A multi’omics Spiroplasma analysis
- It is a particularly good example that demonstrates how pangenomics can reveal appropriate targets for high-resolution phylogenomics.
- A fully reproducible bioinformatics workflow for this multi'omics analysis is here. Anvi'o databases to interactively reproduce and explore the Spiroplasma pangenome is also available.
Reproducible bioinformatics workflow
http://merenlab.org/data/spiroplasma-pangenome/ gives access to the complete bioinformatics workflow we used in this study.
A Wolbachia Plasmid
- Yet another application of metapangenomics and an applicatoin of minION long-read sequencing on extremely low-biomass samples.
- Reproducible bioinformatics workflow with all data items, and a 'behind the paper' blog post by Julie Reveillaud.
Reproducible bioinformatics workflow
http://merenlab.org/data/wolbachia-plasmid gives access to our complete bioinformatics workflow.
Raw and/or reproducible data items
- The raw sequencing data for shotgun metagenomes are available in the European Nucleotide Archive via accession code PRJEB26028.
- doi:10.6084/m9.figshare.7306784: Raw sequencing data for the MinION run for Wolbachia ovary samples.
- doi:10.6084/m9.figshare.6263867: FASTA files for individual assemblies of each ovary metagenome.
- doi:10.6084/m9.figshare.6292040: The four metagenome-assembled Wolbachia genomes.
- doi:10.6084/m9.figshare.6380015: Artificially circularized individual sequences for the plasmid pWCP.
- doi:10.6084/m9.figshare.6263876: The anvi’o merged profile databases for each Culex pipiens metagenome.
- doi:10.6084/m9.figshare.6291650: The anvi’o pan database and the genomes storage for the Wolbachia metapangenome.
The Prochlorococcus Metapangenome
- Metapangenomes reveal to what extent genes that may be linked to the ecology and fitness of microbes are conserved within a phylogenetic clade.
- Reproducible bioinformatics workflow.
Reproducible bioinformatics workflow
http://merenlab.org/data/prochlorococcus-metapangenome gives access to our complete bioinformatics workflow.
Raw and/or reproducible data items
- The original TARA Oceans metagenomes are available through the European Bioinformatics Institute (ERP001736) and NCBI (PRJEB1787).
- doi:10.6084/m9.figshare.5447221: 31 isolate genomes and 74 single-amplified genomes for Prochlorococcus we used in this study.
- doi:10.6084/m9.figshare.5447224: The anvi’o merged profile database and the static HTML summary output for Prochlorococcus isolate genomes across 93 TARA Oceans Project surface ocean metagenomes.
- doi:10.6084/m9.figshare.5447227: The metapangenome of Prochlorococcus isolates.
- doi:10.6084/m9.figshare.5447233: The metapangenome of Prochlorococcus single-amplified genomes.
- doi:10.6084/m9.figshare.5447233: Extended pangenome of Prochlorococcus isolate and single-amplified genomes.
Genomes from Tara Oceans Metagenomes
- Nearly 1,000 non-redundant, high-quality bacterial, archaeal, and eukaryotic population genomes from TARA Oceans metagenomes.
- A "behind the paper" blog post by Meren, a press release by the MBL, and an extensive description of the bioinformatics workflow.
Reproducible bioinformatics workflow
http://merenlab.org/data/tara-oceans-mags gives access to our complete bioinformatics workflow.
Raw and/or reproducible data items
- The original TARA Oceans metagenomes are available through the European Bioinformatics Institute (ERP001736) and NCBI (PRJEB1787).
- doi:10.6084/m9.figshare.4902920: Our raw assembly outputs per region.
- doi:10.6084/m9.figshare.4902917: All amino acid sequences in our raw assemblies.
- doi:10.6084/m9.figshare.4902923: FASTA files for 957 non-redundant metagenome-assembled genomes.
- doi:10.6084/m9.figshare.4902941: Self contained anvi’o profiles for each non-redundant MAG (each of which can be visualized interactively through the program
anvi-interactive
offline). - doi:10.6084/m9.figshare.4902926: A static HTML output for the anvi’o merged profile database for non-redundant MAGs (double-click the index.html file after download).
- doi:10.6084/m9.figshare.4902938: Main and Supplementary Tables and Figures. Which includes Figure 1 (geographically bounded metagenomic co-assemblies), Figure 2 (the nexus between phylogeny and function of HBDs), Figure 3 (Phylogeny of nitrogen fixation genes), Figure 4 (the abundance of nitrogen-fixing populations of Planctomycetes and Proteobacteria across oceans), Supplementary Figure 1 (phylogenetic analysis of nifH genes), Supplementary Table 1 (the summary of the 93 metagenomes from TARA Oceans, and the twelve geographic regions they represent), Supplementary Table 2 (the summary of the co-assembly and binning outputs for each metagenomic set), Supplementary Table 3 (genomic features of 957 MAGs from the non-redundant genomic database including the taxonomy for each MAG, the mean coverage, relative distribution, detection and number of recruited reads for each MAG across the 93 metagenomes, etc), Supplementary Table 4 (the 16S rRNA gene sequence identified in HBD-09), Supplementary Table 5 (genomic features, Pearson correlation (based on the relative distribution in 93 metagenomes), Supplementary Table 6 (RAST subsystems and KEGG modules for the nine HBDs), Supplementary Table 7 (nifH gene sequences in MAGs, orphan scaffolds, as well as the reference sequence γ-24774A11, along with their mean coverage across the 93 metagenomes), Supplementary Table 8 (Genomic features of 30,244 bins manually characterized from the 12 metagenomic sets. Completion and redundancy estimates are based on the average of four bacterial single-copy gene collections), Supplementary Table 9 (KEGG annotation for 1,077 MAGs), and Supplementary Table 10 (Relative distribution of 1,077 MAGs across the 93 metagenomes).
- An interactive visualization for the phylogenomic analysis of 432 Proteobacteria and 43 Planctomycetes metagenome-assembled genomes from our database of 957 non-redundant MAGs is also available: here.
Genome-resolved Fecal Microbiota Transplantation
- Bacteroidales: high-colonization rate. Clostridiales: low colonization rate. Colonization success is negatively correlated with the number of genes related to sporulation.
- MAGs with the same taxonomy showed different colonization properties, highlighting the importance of high-resolution analyses.
- Populations colonized both recipients were also prevalent in the HMP cohort (and the ones that did not, distribute sporadically across the HMP cohort).
Raw and/or reproducible data items
- While the accession ID SRP093449 serves serves the raw shotgun metagenomic data through NCBI’s Short Read Archive, this FigShare collection gives access to all public data items detailed below.
- doi:10.6084/m9.figshare.4792633: Files for anvi’o manual interactive for a quick visualization of the distribution of 92 donor MAGs from Lee and Khan et al study (panel a in the figure above). Follow these steps for a quick interactive visualization:
# download the file
wget https://ndownloader.figshare.com/files/7879036 -O ANVIO-FMT-D-R01-R02-QUICK-VISUALIZATION.tar.gz
# unpack
tar -zxvf ANVIO-FMT-D-R01-R02-QUICK-VISUALIZATION.tar.gz
# go into the directory
cd ANVIO-FMT-D-R01-R02-QUICK-VISUALIZATION
# run anvi-interacive
anvi-interactive -p profile.db \
-s samples.db \
-t tree.txt \
-d data.txt \
--manual \
--title "The distribution of 92 donor MAGs"
- doi:10.6084/m9.figshare.4792627: A static HTML output for the anvi’o merged profile database for the 92 donor MAGs. This static web site contains FASTA files for MAGs, coverage and detection values, functional annotations, and other essential information.
- doi:10.6084/m9.figshare.4792621: The full anvi’o merged profile and contigs databases for the 92 donor MAGs (for each MAG gives access to the detailed information shown in panel b and panel c in the figure above). Among many other things you can do with these anvi’o files, you can use the program
anvi-refine
to study any MAG in the study in detail. MAGs are described in a collection namedMAGs
. Here is a quick example:
# download the merged anvi'o profile for the study
wget https://ndownloader.figshare.com/files/7879024 -O ANVIO-FMT-D-R01-R02-MERGED-PROFILE.tar.gz
# unpack the archive, and get into the directory
tar -zxvf ANVIO-FMT-D-R01-R02-MERGED-PROFILE.tar.gz && cd ANVIO-FMT-D-R01-R02-MERGED-PROFILE
# take a look at the MAGs stored in the profile
anvi-script-get-collection-info -p PROFILE.db -c CONTIGS.db -C MAGs
Auxiliary Data ...............................: Found: CONTIGS.h5 (v. 1)
Contigs DB ...................................: Initialized: CONTIGS.db (v. 8)
Bins in collection "MAGs"
===============================================
FMT-Donor_MAG_00027 :: PC: 95.98%, PR: 3.30%, N: 129, S: 1,914,574, D: bacteria (0.99)
FMT-Donor_MAG_00069 :: PC: 82.93%, PR: 7.68%, N: 154, S: 1,168,783, D: bacteria (0.91)
FMT-Donor_MAG_00025 :: PC: 96.34%, PR: 3.99%, N: 139, S: 2,740,980, D: bacteria (1.00)
FMT-Donor_MAG_00068 :: PC: 76.07%, PR: 0.72%, N: 109, S: 1,012,764, D: bacteria (0.77)
FMT-Donor_MAG_00041 :: PC: 89.80%, PR: 3.63%, N: 282, S: 2,471,626, D: bacteria (0.93)
FMT-Donor_MAG_00010 :: PC: 100.00%, PR: 4.38%, N: 224, S: 3,125,246, D: bacteria (1.04)
(...)
# pick one, and visualize it interactively
anvi-refine -p PROFILE.db -c CONTIGS.db -C MAGs -b FMT-Donor_MAG_00054
# which should play out just like in the following video
This is what you should see after entering the last command in your terminal:
- doi:10.6084/m9.figshare.4793761: Individual anvi’o profiles for the occurrence of each the FMT donor MAG across 151 HMP gut metagenomes (for each MAG gives access to the information shown in panel d in the figure above).
- doi:10.6084/m9.figshare.4792645: Individual figures that show the detection of 92 donor MAGs in 151 HMP gut metagenomes.
Bacteroides in Pouchitis
Raw and/or reproducible data items
- Anvi’o profiles: https://doi.org/10.6084/m9.figshare.3851364
- Primary and supplementary figures: https://doi.org/10.6084/m9.figshare.3851481.v2
- Supplementary tables: https://doi.org/10.6084/m9.figshare.3851478.v2
A blog post on data reproducibility
A blog post on how (and why) to reproduce these data: Bacteroides Genome Variants, and a reproducible science exercise with anvi’o.
For an example on how to re-analyze these anvi’o profiles, please click here.
Tardigrade Assembly Re-analysis
- A re-analysis of the first released Tardigrade genome reveals a likely symbiont among other contaminants.
- A practical approach to estimate the number bacterial genomes in an assembly.
Raw and/or reproducible data items
- This link will download the archive file for anvi’o profile for merged datasets (221 Mb). The run script in this archive will automatically start the anvi’o interactive interface, and draw Figure 1 (compatible with anvi’o v1.2.2, for which a docker container is available).
- Visit this adress to view the anvi’o summary for genomic selections in Figure 1. Alternatively, you can use this address to download this static HTML output to view on your own computer.
-
Following links give access to the individual genome files:
- This link will download everything necessary to recreate -an unpolished version of- Figure 2 appears in the manuscript, including the run script that will run the process automatically (compatible both with v1 and v2 branches of anvi’o).
- Following links give access to media files and supplementary tables:
- Figure 1. Holistic assessment of the tardigrade genome release from Boothby et al. (2015). Dendrogram in the center organizes scaffolds based on sequence composition and coverage values in data from 11 DNA libraries. Scaffolds larger than 40 kbp were split into sections of 20 kbp for visualization purposes. Splits are displayed in the first inner circle and GC-content (0-71%) in the second circle. In the following 11 layers, each bar represents the portion of scaffolds covered by short reads in a given sample. The next layer shows the same information for RNA-Seq data. Scaffolds harboring genes used by Boothby et al. to support the expended HGT hypothesis is shown in the next layer. Finally, the most outer layer shows our selections of scaffolds as draft genome bins: the curated tardigrade genome (selection number 1), as well as three near-complete bacterial genomes originating from various contamination sources (selection number 2, 3, and 4).
- Figure 2. Occurrence of the 139 bacterial single-copy genes reported by Campbell et al. (2013) across scaffold collections. The top two plots display the frequency and distribution of single-copy genes in the raw tardigrade genomic assembly generated by Boothby et al. (2015), and Koutsovoulos et al. (2015), respectively. The bottom two plots display the same information for each of the curated tardigrade genomes. Each bar represents the squared-root normalized number of significant hits per single-copy gene. The same information is visualized as box-plots on the left side of each plot.
- Supplementary Figire 1. Visualization and curation of the raw tardigrade genome assembly from Koutsovoulos et al. (2015). In the left panel (curation step I), 24,841 scaffolds that were longer than 1 kbp from the raw assembly were clustered based on sequence composition and coverage values in data from the two Illumina sequencing libraries (the inner dendrogram). Scaffolds longer than 40 kbp were split into sections of 20 kbp for visualization purposes. The second layer shows the GC-content for each scaffold. Next two view layers represent the log-normalized mean coverage values for scaffolds in the two sequencing datasets. Finally, our scaffold selections (tardigrade draft 01 and six bacterial draft genomes) are displayed in the outer layer. In the right panel (curation step II), the 15,839 scaffolds from the tardigrade selection from step I were clustered based on sequence composition only for a more precise curation. Additional scaffold selections (tardigrade draft 02 and two bacterial draft genomes) are displayed in the outer layer.
- Supplementary Table 1. *Summary of H. dujardini and bacterial genomes identified from the raw assembly results of Boothby et al. (2015) and Koutsovoulos et al. (2015). * Inferred from Boothby et al. (2015) and Koutsovoulos et al. (2015) publications. ** Scores were calculated using bacterial single copy genes from Campbell et al. (2013) and are only used to assess bacterial contamination levels in the eukaryotic assembly results.
- Supplementary Table 2. Summary of functions identified by RAST in the bacterial draft genome #2 (selection #3 in Fig. 1).
- Supplementary Table 3. Summary of HMM hits for each bacterial single-copy gene (collection of 139 from Campbell et al. (2013)) identified in 1) the raw assembly by Boothby et al. (2015), 2) the raw assembly by Koutsovoulos et al. (2015), 3) the curated draft genome of Hypsibius dujardini from Boothby et al. assembly in this study, and 4) the curated draft genome of H. dujardini from Koutsovoulos et al. (2015).
Everything mentioned on this page can be cited using doi 10.6084/m9.figshare.2067057.
Anvi’o Methods Paper
- Binning, and single-nucleotide variant analysis of a human gut time series metagenome.
- Re-analysis of cultivar genomes, metagenomes, and metatranscriptomes associated with the Deepwater Horizon oil spill.
Raw and/or reproducible data items
The anvi’o profiles here will run with a much earlier version of anvi’o. If you would like to work with them, please checkout your anvi’o codebase to this commit. Please don’t hesitate to write us if you need assistance.
Daily Infant Gut Samples by Sharon et al.
Raw data and anvi’o results for the section on supervised binning and the analysis of the variability in genome bins.
- Visit this address to try the anvi’o interactive interface on the infant gut data, which provides the basis for Figure 2.
- You can view the summary of the 13 bins, or you can download the browsable output.
- This Github repository gives access to the code that generates Figure 3 (see the relevant directory).
- You can download the output of
anvi-merge
(the merged profile db, and the annotation db) for the infant gut metagenomes from here.
Pensacola Beach Samples by Overholt et al. and Rodriguez-R et al.
Raw data and anvi’o results for the section on linking cultivar genomes with metagenomes.
- While this address gives access to the anvi’o summary of the ten cultivar genomes (download), this one serves the 56 metagenomic bins (download) shown in Figure 4.
- You can download the output of
anvi-merge
for the mapping of metagenomes to Overholt cultivars from here, and the output foranvi-merge
for metagenomic bins is available here.
Gulf of Mexico Samples by Mason et al., and Yergeau et al.
Results for the section on linking metagenomes, metatranscriptomes, and single-cell genomes.
- You can downlad the three main bins shown in Figure 5 panel B) here: DWH Unknown, DWH Cryptic, DWH O. desum.
- You can view the summary of the three bins identified in the metagenomic assembly (download) (Figure 5 panel B).
- This Github repository also gives access to the code that generates Figure 5 panel C (see the relevant directory).
- You can download the output of
anvi-merge
for the mapping of all samples against the assembly of SAGs from here, and the output foranvi-merge
for mapping to metagenomic contigs is available here.
Media and Supplementary files
- Tables: Additional file 1, Additional file 2, Additional file 3.
- Figures: Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Supplementary Figure 1, Supplementary Figure 2.