Microbial 'omics

a post by A. Murat Eren (Meren)

and Özcan C. Esen

Table of Contents

This document last updated on April 1st, 2016.

Anvi’o is a comprehensive ‘omics platform with a large codebase to perform wide range of tasks, and in-depth analyses of large datasets.

With anvi’o you can do metagenomic binning, characterize single-nucleotide variation, study bacterial pangenomes, benchmark software tools, predict number of bacterial genomes in a metagenomic assembly, or even remove contamination from eukaryotic assembly projects. Anvi’o’s ‘versatility’ partly comes from its integrated visualization framework that allows the user to see all these different types of data, and interact with them.

The anvi’o interactive interface is a fully customizable visualization environment that is accessible through an intuitive interface to efficiently visualize complex data. It can handle large datasets, and it’s source code is freely available within the anvi’o platform.

Although it is fully integrated with core anvi’o operations detailed in the metagenomic workflow tutorial, the visualization environment can be initiated in an ad hoc manner by using the anvi-interactive program with --manual-mode flag, or through anvi’server, without an anvi’o installation. In summary, if you have a matrix file, anvi’o may be useful to you to generate high-quality, publication-ready figures with mouse clicks.

The purpose of this article is to provide a more detailed description of the interface by demonstrating the data types the interface can work with, and later details of the user interface.


A little note on our ongoing project, anvi'server

To make the anvi’o interactive interface more accessible, we teamed up with Tobias Paczian, and with his remarkable efforts created a web service. This new service, which we call anvi’server, is now running at http://anvi-server.org. Through anvi’server, you can perform anvi’o visualizations by uploading your data through a simple interface:

Topics covered for the remainder of this article are directly applicable to the interactive interface whether it is accessed through local anvi’o installations, or through the anvi’server.

Please note that the anvi’server is under active development, and your testing efforts will be greatly appreciated. Please don’t hesitate to get in touch with us if you have any questions.


1. Data types and input files

The purpose of this section is to provide examples for each of the data type (and input files) the interactive interface can work with. For this, I will start with a simple tree, and add layers step by step to describe different data types.

You can follow these examples in two ways:

  • Using anvi-interactive in your terminal: For each data type I will either provide a link to the files used in the example command line, or give an example file structure so you can try them on your own files.
  • Using http://anvi-server.org: The other option is to use our new anvi’server without installing anvi’o. If your only purpose with the interactive interface is to do an ad hoc visualization, I think this would be the best way to go. Otherwise you can read about the ways to install the platform on your own server or laptop.

Command lines mentioned in this article are run on anvi’o version 2 or later. You can check your verison using anvi-profile -v.

OK. Let’s start.

1.1 Newick tree

The least you can do with the anvi’o interactive interface would be to visualize a newick-formatted tree. The tree file I use for this example contains 300 leaves. This is how I run the interactive interface from the command line:

anvi-interactive -t tree.txt -p profile.db --title 'Interface Demo I' --manual

Which gives me this:

Here is the anvi’server link for this visualization: http://anvi-server.org/public/meren/interface_demo_I

1.2 Numerical

Let’s assume you for each item you have in the previous tree, you have multiple numerical values you want to overlay on the tree in a TAB-delimited matrix file that looks like this:

contig c1 c2 c3
cathetus 13.66596066 9.590942918 46.01477372
centrist 11.32571669 10.08709331 36.11828559
cascaded 10.82858312 6.312884813 34.50972839
crocking 12.46382532 6.595503878 42.43493547
cinchona 12.78031823 8.77463282 30.56242617
couchant 11.90769144 4.366432391 46.86065633
creviced 12.88691388 9.368377696 42.06930372
clovered 10.51030592 8.200407215 27.51057477
contempt 11.25596871 7.310381191 28.93509622
(…) (…) (…) (…)

This data can be visualized along with the tree this way:

anvi-interactive -t tree.txt -d view_data.txt -p profile.db --title 'Interface Demo II' --manual

Here is the anvi’server link for this visualization: http://anvi-server.org/public/meren/interface_demo_II

If you only have this matrix file but not the tree file, you can run this command to get the tree file:

anvi-matrix-to-newick view_data.txt -o tree.txt

1.3 Text

Adding a text layer is as simple as adding a column of text values in your matrix file:

contig c1 c2 c3 text
backrest 11.68762604 31.81211217 16.14890468 nmwje
backward 8.383113248 36.27705135 12.26495265 bqmyujrpsrddoefhi
backwind 14.30588649 33.19818058 13.90379515 hkferlchpmzix
backyard 12.98490431 35.1528258 14.2861336 advoebfkyhmg
bacteria 6.655636411 34.45757002 13.67026608 lqmcwnhywco
bacterin 7.664397508 35.28860294 16.85214201 vxqdmn
baetylus 13.72179583 32.81162715 14.83756192 fkgpydiowgyhfxxwlpj
bagpiped 9.095058043 33.0406183 13.48649074 ijmnur
balconet 10.94823472 33.03846226 16.87202682 ecizgs
(…) (…) (…) (…) (…)

Now I can run the same command on this updated matrix file:

anvi-interactive -t tree.txt -d view_data.txt -p profile.db --title 'Interface Demo III' --manual

And here is the result:

Anvi’server link: http://anvi-server.org/public/meren/interface_demo_III

Please note that when I first visualized only the tree file, there were labels for each leaf. However, when I visualized the tree file along with the data, labels disappeared. Anvi’o is designed to visualize trees with thousands of items, where having labels rarely is useful. Therefore the default behavior of the visualization interface is to omit labels when there is data, and labels are shown only when the user wants to visualize a single tree. If you would like to show labels, you can simply use the text data type to duplicate the first column in your matrix file:

contig labels c1 c2 c3 text
backrest backrest 11.68762604 31.81211217 16.14890468 nmwje
backward backward 8.383113248 36.27705135 12.26495265 bqmyujrpsrddoefhi
backwind backwind 14.30588649 33.19818058 13.90379515 hkferlchpmzix
backyard backyard 12.98490431 35.1528258 14.2861336 advoebfkyhmg
bacteria bacteria 6.655636411 34.45757002 13.67026608 lqmcwnhywco
bacterin bacterin 7.664397508 35.28860294 16.85214201 vxqdmn
baetylus baetylus 13.72179583 32.81162715 14.83756192 fkgpydiowgyhfxxwlpj
bagpiped bagpiped 9.095058043 33.0406183 13.48649074 ijmnur
balconet balconet 10.94823472 33.03846226 16.87202682 ecizgs
(…) (…) (…) (…) (…) (…)

1.4 Categorical

Here I am adding a column of categorical data in the same file:

contig c1 c2 c3 categorical text
backrest 11.68762604 31.81211217 16.14890468 a nmwje
backward 8.383113248 36.27705135 12.26495265 a bqmyujrpsrddoefhi
backwind 14.30588649 33.19818058 13.90379515 b hkferlchpmzix
backyard 12.98490431 35.1528258 14.2861336 b advoebfkyhmg
bacteria 6.655636411 34.45757002 13.67026608 b lqmcwnhywco
bacterin 7.664397508 35.28860294 16.85214201 c vxqdmn
baetylus 13.72179583 32.81162715 14.83756192 c fkgpydiowgyhfxxwlpj
bagpiped 9.095058043 33.0406183 13.48649074 c ijmnur
balconet 10.94823472 33.03846226 16.87202682 c ecizgs
(…) (…) (…) (…) (…) (…)

The same command line for the matrix file above:

anvi-interactive -t tree.txt -d view_data.txt -p profile.db --title 'Interface Demo IV' --manual

Produces this:

Anvi’server link: http://anvi-server.org/public/meren/interface_demo_IV

Some of you are probably asking themselves ‘what is the difference between text and categorical data?’. Very good question! Essentially they are both the same. If the unique number of items in a given dataset of non-integer values more than 12, the interactive interface assumes that this is a text layer, and instead of assigning random colors to each item, it shows it as such. In contrast, if there are 12 or less unique values, the interface by default treats it as a categorical data layer, and assigns random colors. Of course these colors can be changed quite easily through the interface, or programmatically by processing the ‘state’ file (more on this later).

1.5 Stacked bars

Stacked bars are a bit tricky compared to the other data types, but nothing too complicated. Here is an example addition to our data file:

contig c1 c2 c3 categorical bars!A;B;C text
backrest 11.68762604 31.81211217 16.14890468 b 278;23;1 nmwje
backward 8.383113248 36.27705135 12.26495265 b 249;52;2 bqmyujrpsrddoefhi
backwind 14.30588649 33.19818058 13.90379515 b 269;32;3 hkferlchpmzix
backyard 12.98490431 35.1528258 14.2861336 b 205;96;4 advoebfkyhmg
bacteria 6.655636411 34.45757002 13.67026608 b 263;38;5 lqmcwnhywco
bacterin 7.664397508 35.28860294 16.85214201 b 298;3;6 vxqdmn
baetylus 13.72179583 32.81162715 14.83756192 b 219;82;7 fkgpydiowgyhfxxwlpj
bagpiped 9.095058043 33.0406183 13.48649074 b 212;89;8 ijmnur
balconet 10.94823472 33.03846226 16.87202682 b 289;12;9 ecizgs
(…) (…) (…) (…) (…) (…) (…)

As you can see, the header field contains multipart information. What is before the exclamation mark is the ‘name’ of this data, which will appear in the interface as a label. The column separated values after the exclamation mark are subsets of the data. This header notation indicates that there will be three numerical values will be present in each following field.

The command line for this matrix:

anvi-interactive -t tree.txt -d view_data.txt -p profile.db --title 'Interface Demo V' --manual

Produces this:

Anvi’server link: http://anvi-server.org/public/meren/interface_demo_V

1.6 Samples information

The samples information file contains one or more data types (numerical, categorical, or stacked-bar data) to provide contextual data for layers of interest. Layers of interest are, in most cases, also correspond to samples in the study. In our test case, these layers correspond to c1, c2 and c3 (see the previous matrix file).

Here is an example samples information file for the view data matrix file we have been using in previous steps for visualization:

samples numerical_01 numerical_02 categorical stacked_bar!X;Y;Z
c1 100 5 A 1;2;3
c2 200 4 B 2;3;1
c3 300 3 B 3;1;2

Please note that this time the first column is composed of layer names that appeared as rows in the data matrix file. If you are using the interactive interface via the command line, you first need to crate a samples database using this file, if you are using anvi’server, you can simply provide the TAB-delimited file via the new project window. Here is the command line (assuming the view data is identical to the one used in the previous example):

anvi-gen-samples-info-database --samples-information samples-information.txt -o samples.db
anvi-interactive -t tree.txt -d view_data.txt -p new_profile.db --title 'Interface Demo VI' --manual -s samples.db

Which produces this:

Please consider reading this article if you are interested in learning more about the samples database.

Anvi’server link: http://anvi-server.org/public/meren/interface_demo_VI

1.7 Samples order

The samples order file contains information about different orderings of samples, so the user can access to this information from the interface to arrange layers of interest. An order can be a comma separated list of sample names, or it can be a newick tree for the organization of samples.

A samples order file can be used with or without a samples information file (see anvi-gen-samples-db -h for help if you are following these examles from your terminal). Here is an example samples order file for the data matrix file we have been using:

attributes basic newick
test_tree   (c2:0.0370199,(c1:0.0227268,c3:0.0227268)Int3:0.0370199);
test_list c3,c1,c2  

There can be as many lines as you like for different orderings, however, there has to be only two columns, both of which should contain all names for layers of interest.

Similar to the utilization of samples information file, you need to create a samples database for this file as well. To make things simpler, I will use the previous samples information file, along with this new order file to create a new samples database prior to calling anvi-interactive:

anvi-gen-samples-info-database -D samples-information.txt -R samples-order.txt -o samples.db
anvi-interactive -t tree.txt -d view_data.txt -p new_profile.db --title 'Interface Demo VII' --manual -s samples.db

In this new display you will find out that the two orderings (test_tree and test_list) appears in the ‘Sample order’ combo box under in the ‘Samples’ tab on the left panel, along with all the orderings autmoatically generated based on the samples information file:

Selecting the test_tree order, and re-drawing the tree will result in a tiny dendrogram for the layers of interst:

Anvi’server link (don’t forget to play with the Samples order combo box): http://anvi-server.org/public/meren/interface_demo_VII

2. Using the anvi’o interactive interface

We tried our best to make anvi’o interface intuitive, easy-to-learn, and easy-to-use. In an ideal world, you shouldn’t need a tutorial to start using it, and learn it through trial and error. But here is a very general overview of the major components of the interface.

If you happen to realize there is something missing, or it would have been helpful to you if something was better explained in this document, please do not hesitate to drop us a line.

2.1 An overview of the display

The interactive interface has two major areas of interaction: the space for visualization on the right, and the left panel. The left panel gives access to various controls to work with the data visualized, and improve the presentataion of it.

2.2 The left panel

At the bottom of the layers tab there is a section with tiny controls that are available in all tabs. Through these controls you can,

  • Create or refresh the display when necessary using the draw button (some changes require you to do that),
  • Zoom in, zoom out, and center the display.
  • Download your display as an SVG file.

See the post “Working with SVG files anvi’o generate” for tips about how to work with large SVG files using Inkscape.

2.2.1 Layers tab

Through the layers tab you can,

  • Change general settings for the tree (i.e., switching between circle or rectengular displays, changing tree radius or width), and layers (i.e., editing layer margins, or activating custom layer margins).
  • Load or save states to store all visual settings, or load a previously saved state.
  • Customize individual layers by switching between different display modes depending on the layer type (i.e., ‘text’ or ‘color’ mode for categorical layers, or ‘bar’ or ‘intensity’ mode for numerical layers), set normalization (i.e., ‘square-root’, or ‘log’ normalization), minimum, and maximum cutoff values for numerical layers, or set layer height, and layer margin (i.e., its distance from the previous layer).
  • Use the multi-selector at the bottom to change settings for multiple layers at once.

2.2.2 Samples tab

Samples tab is for the additional data you provide the interface through a samples database (see samples order and samples infomration sections above). Through this layer you can,

  • Change the order of layers using automatically-generated or user-provided orders of layers using the Sample order combo box,
  • Customize individual samples information entries.

Changes in this tab can be reflected to the current display without re-drawing the entire tree unless the sample order is changed.

2.2.3 Bins tab

Anvi’o allows you to create selections of items shown in the display (whether they are metagenomic contigs, 16S rRNA tags, or any other type of information). Bins tab allow you to maintain these selections. Any selection on the tree will be added to active bin in this tab (the state radio button next to a bin defines its activity). Through this tab you can,

  • Create or delete bins, set bin names, change the color of a given bin, or sort bins based on their name, the number of units they carry, or completion and contamination estimates (completion / contamination estimates are only computed for genomic or metagenomic analyses).
  • View the number of selected units in a given bin, and see the list of names in the selection by clicking the button that shows the number of units described in the bin.
  • Store a collection of bins, or load a previously stored collection.

2.2.4 Mouse tab

The mouse tab displays the value of items underneath the mouse pointer while the user browse the tree.

Displaying the numerical or categorical value of an item shown on the tree is not an easy task. We originally thought that displaying pop-up windows would solve it, but besides the great overhead, it often became a nuisance while browsing parts of the tree. We could show those pop-up displays only when use clicks on the tree, however click-behavior is much more appropriate to add or remove individual items from a bin, hence, it wasn’t the best solution either. So we came up with the ‘mouse tab’. You have a better idea? I am not surprised! We would love to try improve your experience: please enter an issue, and let’s discuss.

2.2.5 Search tab

It does what the name suggests. Using this tab you can,

  • Build expressions to search items visualized in the main display.
  • Highlight matches, and append them to, or remove them from the selected bin in the Bins tab.

Tips and tricks

Here are some small conveniences that may help the interface serve you better (we are happy to expand these little tricks with your suggestions).

  • You can zoom to a section of the display by making a rectangular selection of the area while the pressing the shift button.

  • You can click an entire branch to add items into the selected bin, and remove them by right-clicking a branch.

  • If you click a branch while pressing the Command or CTRL button, it will create a new bin, and add the content of the selection into that bin.

  • By pressing 1, 2, 3, 4, and 5, you can go between Layers, Bins, Samples, Mouse, and Search tabs!