This page serves the publicly available data mentioned in our publications, or in publications our group is involved. Please do not hesitate to get in touch if something is missing.
Vineis et al. (2016)
The paper itself is here: Patient-Specific Bacteroides Genome Variants in Pouchitis.
Here is a blog post on it: Bacteroides Genome Variants, and a reproducible science exercise with anvi’o.
Data for the paper:
Anvi’o profiles: https://dx.doi.org/10.6084/m9.figshare.3851364.v1
Primary and supplementary figures: https://dx.doi.org/10.6084/m9.figshare.3851481.v2
Supplementary tables: https://dx.doi.org/10.6084/m9.figshare.3851478.v2
The anvi’o profiles article in this data collection contains 22 items:
For an example on how to re-analyze these anvi’o profiles, please click here.
Delmont & Eren (2016)
This link will download the archive file for anvi’o profile for merged datasets (221 Mb). The run script in this archive will automatically start the anvi’o interactive interface, and draw Figure 1 (compatible with anvi’o v1.2.2, for which a docker container is available).
Following links give access to the individual genome files:
This link will download everything necessary to recreate -an unpolished version of- Figure 2 appears in the manuscript, including the run script that will run the process automatically (compatible both with v1 and v2 branches of anvi’o).
Following links give access to media files and supplementary tables:
Figure 1. Holistic assessment of the tardigrade genome release from Boothby et al. (2015). Dendrogram in the center organizes scaffolds based on sequence composition and coverage values in data from 11 DNA libraries. Scaffolds larger than 40 kbp were split into sections of 20 kbp for visualization purposes. Splits are displayed in the first inner circle and GC-content (0-71%) in the second circle. In the following 11 layers, each bar represents the portion of scaffolds covered by short reads in a given sample. The next layer shows the same information for RNA-Seq data. Scaffolds harboring genes used by Boothby et al. to support the expended HGT hypothesis is shown in the next layer. Finally, the most outer layer shows our selections of scaffolds as draft genome bins: the curated tardigrade genome (selection number 1), as well as three near-complete bacterial genomes originating from various contamination sources (selection number 2, 3, and 4).
Figure 2. Occurrence of the 139 bacterial single-copy genes reported by Campbell et al. (2013) across scaffold collections. The top two plots display the frequency and distribution of single-copy genes in the raw tardigrade genomic assembly generated by Boothby et al. (2015), and Koutsovoulos et al. (2015), respectively. The bottom two plots display the same information for each of the curated tardigrade genomes. Each bar represents the squared-root normalized number of significant hits per single-copy gene. The same information is visualized as box-plots on the left side of each plot.
Supplementary Figire 1. Visualization and curation of the raw tardigrade genome assembly from Koutsovoulos et al. (2015). In the left panel (curation step I), 24,841 scaffolds that were longer than 1 kbp from the raw assembly were clustered based on sequence composition and coverage values in data from the two Illumina sequencing libraries (the inner dendrogram). Scaffolds longer than 40 kbp were split into sections of 20 kbp for visualization purposes. The second layer shows the GC-content for each scaffold. Next two view layers represent the log-normalized mean coverage values for scaffolds in the two sequencing datasets. Finally, our scaffold selections (tardigrade draft 01 and six bacterial draft genomes) are displayed in the outer layer. In the right panel (curation step II), the 15,839 scaffolds from the tardigrade selection from step I were clustered based on sequence composition only for a more precise curation. Additional scaffold selections (tardigrade draft 02 and two bacterial draft genomes) are displayed in the outer layer.
Supplementary Table 1. *Summary of H. dujardini and bacterial genomes identified from the raw assembly results of Boothby et al. (2015) and Koutsovoulos et al. (2015). * Inferred from Boothby et al. (2015) and Koutsovoulos et al. (2015) publications. ** Scores were calculated using bacterial single copy genes from Campbell et al. (2013) and are only used to assess bacterial contamination levels in the eukaryotic assembly results.
Supplementary Table 2. Summary of functions identified by RAST in the bacterial draft genome #2 (selection #3 in Fig. 1).
Supplementary Table 3. Summary of HMM hits for each bacterial single-copy gene (collection of 139 from Campbell et al. (2013)) identified in 1) the raw assembly by Boothby et al. (2015), 2) the raw assembly by Koutsovoulos et al. (2015), 3) the curated draft genome of Hypsibius dujardini from Boothby et al. assembly in this study, and 4) the curated draft genome of H. dujardini from Koutsovoulos et al. (2015).
Everything mentioned on this page can be cited using doi 10.6084/m9.figshare.2067057.
Eren et al. (2015)
The paper itself is here: Anvi’o: an advanced analysis and visualization platform for ‘omics data.
The anvi’o profiles here will run with a much earlier version of anvi’o. If you would like to work with them, please checkout your anvi’o codebase to this commit. Please don’t hesitate to write us if you need assistance.
Daily Infant Gut Samples by Sharon et al.. Raw data and anvi’o results for the section on supervised binning and the analysis of the variability in genome bins.
- Visit this address to try the anvi’o interactive interface on the infant gut data, which provides the basis for Figure 2.
- You can view the summary of the 13 bins, or you can download the browsable output.
- This Github repository gives access to the code that generates Figure 3 (see the relevant directory).
- You can download the output of
anvi-merge(the merged profile db, and the annotation db) for the infant gut metagenomes from here.
Pensacola Beach Samples by Overholt et al. and Rodriguez-R et al.. Raw data and anvi’o results for the section on linking cultivar genomes with metagenomes.
- While this address gives access to the anvi’o summary of the ten cultivar genomes (download), this one serves the 56 metagenomic bins (download) shown in Figure 4.
- You can download the output of
anvi-mergefor the mapping of metagenomes to Overholt cultivars from here, and the output for
anvi-mergefor metagenomic bins is available here.
Gulf of Mexico Samples by Mason et al., and Yergeau et al.. Results for the section on linking metagenomes, metatranscriptomes, and single-cell genomes.
- You can view the summary of the two bins identified in the assembly of the single-cell genomes (download) (Figure 5 panel A).
- You can view the summary of the three bins identified in the metagenomic assembly (download) (Figure 5 panel B).
- This Github repository also gives access to the code that generates Figure 5 panel C (see the relevant directory).
- You can download the output of
anvi-mergefor the mapping of all samples against the assembly of SAGs from here, and the output for
anvi-mergefor mapping to metagenomic contigs is available here.
Media and Supplementary files.