A program that computes metabolic enrichment acros groups of genomes and metagenomes.
This program computes metabolic module enrichment across groups of genomes or metagenomes and returns a functional-enrichment-txt file (throughout this text, we will use the term genome to describe both for simplicity).
To run this program, you must already have estimated the completeness of metabolic modules in your genomes using the program anvi-estimate-metabolism and obtained a “modules” mode output file (which is the default output mode of that program). In addition to that, you will need to provide a groups-txt file to declare which genome belongs to which group for enrichment analysis to consider.
How does it work?
Determine the presence of modules. Each module in the “modules” mode output has a completeness score associated with it in each genome, and any module with a completeness score over a given threshold (set by
--module-completion-threshold) will be considered to be present in that genome.
Quantify the distribution of modules in each group of genomes. The distribution of a given module across genomes in each group will determine its enrichment. This is done by fitting a generalized linear model (GLM) with a logit linkage function in
anvi-script-enrichment-stats, and it produces a functional-enrichment-txt file.
See kegg-metabolism for more information on how to generate a “modules” mode output format from anvi-estimate-metabolism. Please note that the genome names in the modules file must match those that you will mention in the groups-txt file.
The default completeness threshold for a module to be considered ‘present’ in a genome is 0.75 (=75%). If you wish to change this, you can do so by providing a different threshold between (0, 1], using the
By default, the column containing genome names in your MODULES.TXT file will have the header
db_name, but there are certain cases in which you might have them in a different column name for your genomes or metagenomes (such as those cases where you did not run anvi-estimate-metabolism in multi-mode). In those cases, you can tell this program to look for a different column name to find your genomes or metagenomes using the
--sample-header. For example, if your metagenome names are listed under the
metagenome_name column, you would do the following:
If you ran anvi-estimate-metabolism on a bunch of extra genomes but only want to include a subset of them in the groups-txt, that is fine. By default, any samples from the
MODULES.TXT file that are missing from the groups-txt will be ignored. However, there is also an option to include those missing samples in the analysis, as one big group called ‘UNGROUPED’. To do this, you can use the
--include-samples-missing-from-groups-txt parameter. Just be careful that if you are also using the
--include-ungrouped flag (see below), any samples without a specified group in the groups-txt will also be included in the ‘UNGROUPED’ group.
Edit this file to update this information.
Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the
__resources__ tag in this file to see an example.