Microbial 'omics


Brought to you by

anvi-estimate-metabolism [program]

Reconstructs metabolic pathways and estimates pathway completeness for a given set of contigs.

See program help menu or go back to the main page of anvi’o programs and artifacts.

Table of Contents

Provides

kegg-metabolism

Requires or uses

contigs-db kegg-db kegg-functions profile-db collection bin external-genomes internal-genomes metagenomes

Usage

anvi-estimate-metabolism predicts the metabolic capabilities of organisms based on their genetic content. It relies upon kegg-functions and metabolism information from the KEGG resource, which is stored in kegg-db.

The metabolic pathways that this program currently considers are those defined by KOs in the KEGG MODULES resource. Each KO represents a gene function, and a KEGG module is a set of KOs that collectively carry out the steps in a metabolic pathway. Therefore, for this to work, you need to have annotated your contigs-db with hits to the KEGG KOfam database by running anvi-run-kegg-kofams prior to using this program.

Given a properly annotated contigs-db, this program determines which KOs are present and from those determines the completeness of each KEGG module. The results are described in a set of output text files, collectively referred to as kegg-metabolism.

Running metabolism estimation on a single contigs database

There are several possible inputs to this program. For single genomes - isolate genomes or MAGs, for example - you can provide a contigs-db. If your contigs-db describes a metagenome rather than a single genome, you can provide the flag --metagenome-mode. In metagenome mode, KOfam hits in the contigs-db are analyzed as though they belong to one collective genome, despite the fact that the sequences represent multiple different populations. Alternatively, if you have binned your metagenome sequences into separate populations and would like metabolism estimation to be run separately on each bin, you can provide a profile-db and a collection.

Estimation for a single genome

anvi-estimate-metabolism -c CONTIGS.db

Estimation for a metagenome

anvi-estimate-metabolism -c CONTIGS.db --metagenome-mode

Estimation for bins in a metagenome

You can estimate metabolism for each bin in a collection:

anvi-estimate-metabolism -c CONTIGS.db -p PROFILE.db -C COLLECTION_NAME

You can also provide a specific bin in that collection to run on:

anvi-estimate-metabolism -c CONTIGS.db -p PROFILE.db -C COLLECTION_NAME -b BIN_NAME

Or, you can provide a specific list of bins in a text file:

anvi-estimate-metabolism -c CONTIGS.db -p PROFILE.db -C COLLECTION_NAME -B bin_ids.txt

Each line in the bin_ids.txt file should be a bin name from the collection.

Running metabolism estimation on multiple contigs databases

If you have a set of contigs databases of the same type (ie, all of them are single genomes, or all are binned metagenomes), you can analyze them all at once. What you need to do is put the relevant information for each contigs-db into a text file and pass that text file to anvi-estimate-metabolism. The program will then run estimation individually on each contigs database in the file. The estimation results for each database will be aggregated and printed to the same output file(s).

Estimation for multiple single genomes

Multiple single genomes (also known as external-genomes) can be analyzed with the same command by providing an external genomes file to anvi-estimate-metabolism. To see the required format for the external genomes file, see external-genomes.

anvi-estimate-metabolism -e external-genomes.txt

Estimation for multiple metagenomes

Multiple metagenomes can be analyzed with the same command by providing a metagenomes input file. Metagenome mode will be used to analyze each contigs database in the file. To see the required format for the external genomes file, see metagenomes.

anvi-estimate-metabolism -M metagenomes.txt

Estimation for multiple bins in different metagenomes

If you have multiple bins (also known as internal-genomes) across different collections or even different metagenomes, they can be analyzed with the same command by providing an internal genomes file to anvi-estimate-metabolism. To see the required format for the external genomes file, see internal-genomes.

anvi-estimate-metabolism -i internal-genomes.txt

Adjusting module completion threshold

KEGG module completeness is computed as the percentage of steps in the metabolic pathway that are ‘present’ based on the KOs found in the contigs database. If this completeness is larger than a certain percentage, then the entire module is considered to be ‘present’ in the genome or metagenome. By default, this module completion threshold is 0.75; that is, 75 percent of the KOs in a module must have a KOfam hit in the contigs database in order for the module to be considered ‘complete’ as a whole. This threshold can be adjusted.

Changing the module completion threshold

In this example, we change the threshold to 50 percent.

anvi-estimate-metabolism -c CONTIGS.db --module-completion-threshold 0.5

Controlling output

anvi-estimate-metabolism can produce a variety of output files. All will be prefixed with the same string, which by default is “kegg-metabolism”.

Changing the output file prefix

anvi-estimate-metabolism -c CONTIGS.db -O my-cool-prefix

This program has two major output options - long format (tab-delimited) output files and matrices.

Long format output has several preset “modes” as well as a “custom” mode in which the user can define the contents of the output file. Multiple modes can be used at once, and each requested “mode” will result in a separate output file. The default output mode is “modules” mode.

You can find more details on the output format by looking at kegg-metabolism.

Viewing available output modes

anvi-estimate-metabolism -c CONTIGS.db --list-available-modes

Using a non-default output mode

anvi-estimate-metabolism -c CONTIGS.db --kegg-output-modes kofam_hits

Using multiple output modes

anvi-estimate-metabolism -c CONTIGS.db --kegg-output-modes kofam_hits,modules

Viewing available output headers for ‘custom’ mode

anvi-estimate-metabolism -c CONTIGS.db --list-available-output-headers

Using custom output mode

anvi-estimate-metabolism -c CONTIGS.db --kegg-output-modes custom --custom-output-headers kegg_module,module_name,module_is_complete

Matrix format is only available when working with multiple contigs databases. Several output matrices will be generated, each of which describes one KEGG module statistic such as completion score or presence/absence.

Obtaining matrix-formatted output

anvi-estimate-metabolism -i internal-genomes.txt --matrix-format

Edit this file to update this information.

Additional Resources

Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__ tag in this file to see an example.