Start an anvi'o interactive display to see functions across genomes.
For a given annotation source for functions, this program will display distribution patterns of unique function names (or accession numbers) across genomes stored in anvi’o databases.
It is a powerful way to analyze differentially occurring functions for any source of annotation that is shared across all genomes.
The simplest way to run this program is as follows:
anvi-display-functions -e external-genomes \ --annotation-source KOfam \ --profile-db KOFAM-PROFILE.db
You can replace the annotation source based on what is available across your genomes. You can use the program anvi-db-info to see all available function annotation sources in a given contigs-db or genomes-storage-db. Please see functions for more information on functions and how to obtain them.
Please note that a profile-db will be automatically generated for you. Once it is generated, the same profile database can be visualized over and over again using anvi-interactive in manual mode, without having to retain any other files.
Aggregating functions using accession IDs
Once it is run, this program essentially aggregates all function names that occur in one or more genomes in the set of genomes found in input sources. The user can ask the program to use accession IDs to aggregate functions rather than function names:
anvi-display-functions -e external-genomes \ --annotation-source KOfam \ --profile-db KOFAM-PROFILE.db \ --aggregate-based-on-accession
While the default setting which is to use function names will be appropriate for most applications, using accession IDs instead of function names may be important for specific applications. There may be an actual difference between using functions or accession to aggregate data since multiple accession IDs in various databases may correspond to the same function. This may lead to misleading enrichment analyses downstream as identical function annotations may be over-split into multiple groups. Thus, the default aggregation method uses function names.
Aggregating functions using accession IDs
In some cases a gene may be annotated with multiple functions. This is a decision often made at the function annotation tool level. For instance anvi-run-ncbi-cogs may yield two COG annotations for a single gene because the significance score for both hits may exceed the default cutoff. While this can be useful in anvi-summarize output where things should be most comprehensive, having some genes annotated with multiple functions and others with one function may over-split them (since in this scenario a gene with COGXXX and COGXXX;COGYYY would end up in different bins). Thus, anvi-display-functions will will use the best hit for any gene that has multiple hits. But this behavior can be turned off the following way:
anvi-display-functions -e external-genomes \ --annotation-source KOfam \ --profile-db KOFAM-PROFILE.db \ --aggregate-using-all-hits
The user also can keep only functions that occur in more than a minimum number of genomes:
anvi-display-functions -e external-genomes \ --annotation-source KOfam \ --profile-db KOFAM-PROFILE.db \ --min-occurrence 5
Combining genomes from multiple sources
Alternatively, you can run the program by combining genomes from multiple sources:
A real-world example
Assume we have a list of external-genomes that include three different species of Bifidobacterium. Running the following command,
anvi-display-functions --external-genomes Bifidobacterium.txt \ --annotation-source COG20_FUNCTION \ --profile-db COG20-PROFILE.db \ --min-occurrence 3
Would produce the following display by default, where each layer is one of the genomes described in the external-genomes file, and each item is a unique function name that occur in
COG20_FUNCTION (which was obtained by running anvi-run-ncbi-cogs on each contigs-db in the external genomes file) that were found in more than three genomes:
The outermost layer shows the function names:
After a quick prettification through the interactive interface, leads to a cleaner display of three distinct species in this group, and functions that are uniquely enriched in either of them:
Edit this file to update this information.
Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the
__resources__ tag in this file to see an example.