Microbial 'omics

# Annotating an anvi'o contigs database with COGs

### a post by A. Murat Eren (Meren)

The COG workflow is for 2.1.0 and later versions of anvi’o. You can learn which version you have on your computer by typing anvi-profile --version in your terminal.

This article describes how to setup COGs on your system, and how to annotate your gene calls in an anvi’o contigs database with COGs.

This is different than importing functions into anvi’o, because this workflow is run by anvi’o programs that use blastp or DIAMOND to search NCBI’s now-quite-outdated-and-not-maintained-but-still-awesome COG database, and provides a one-step-solution for the functional annotation problem.

Citation information! If you are using this workflow, along with the search algorithm you will elect to use, you should also cite the following work: Expanded microbial genome coverage and improved protein family annotation in the COG database.

We make a lot of typos. Sometimes parameters change slightly across versions, and we often fail to keep tutorials up-to-date at all times. If you find a mistake on this page, or if you would like to change something within it, you can directly edit its source code by clicking the “Edit this file” icon at the top-right corner (which you will see if you have logged in to GitHub). We will be very thankful for your contribution 😇

## Annotating an anvi’o contigs database with COGs

It is quite simple. All you need is the program anvi-run-ncbi-cogs. Here is an example:

Just to give a ballpark idea: annotations for about 60,000 amino acid sequences were added into the contigs database in about 5 hours using 50 cores.

Well, if this is your first time, then probably it will not go as smooth as it here as you will need to let anvi’o set up the COG distribution on your system.

## Setting up the COG distribution

This is something you will do only once (unless you have to do it again later for various reasons). During this step anvi’o will download necessary files from ftp://ftp.ncbi.nih.gov/pub/COG/COG2014/data/, and reformat them so they can be used later by anvi-run-ncbi-cogs program and other tools. The formatting step includes changing reorganizing information in raw files, serializing a very large text file into binary Python object for fast access while converting protein IDs to COGs, and finally generating BLAST and DIAMOND search databases.

Luckily, all you need to do is to run anvi-setup-ncbi-cogs. Here is an example:

Depending on your installation, you may have to run the following command with superuser privileges! If you want to avoid that, you can use the --cog-data-dir parameter to set up a different path.

If you are on a server system and you don’t want the interface to ask you any questions, add --just-do-it flag to your command line.

If something is wrong with your setup and you can’t figure out what is the problem, try running anvi-setup-ncbi-cogs with the flag --reset.

If you are still reading this page it probably is there is a problem for which you couldn’t find an answer here. Sigh. OK. How about sending an e-mail? :(