Microbial 'omics


Brought to you by

anvi-gen-variability-profile [program]

Generate a table that comprehensively summarizes the variability of nucleotide, codon, or amino acid positions. We call these single nucleotide variants (SNVs), single codon variants (SCVs), and single amino acid variants (SAAVs), respectively. Learn more here: http://merenlab.org/2015/07/20/analyzing-variability/.

See program help menu or go back to the main page of anvi’o programs and artifacts.

Table of Contents

Provides

variability-profile-txt

Requires or uses

contigs-db profile-db structure-db bin variability-profile splits-txt

Usage

This program takes the variability data stored within a profile-db and compiles it from across samples into a single matrix that comprehensively describes your SNVs, SCVs or SAAVs (a variability-profile-txt).

This program is described on this blog post, so take a look at that for more details.

Let’s talk parameters

Here is a basic run with no bells or whisles:

anvi-gen-variability-profile -p profile-db \ -c contigs-db

You can add structural annotations by providing a structure-db.

anvi-gen-variability-profile -p profile-db \ -c contigs-db \ -s structure-db

Focusing on a subset of the input

You can focus on a specific collection, bin, genes (by providing a file or list of caller IDs) or list of splits (in the form of a splits-txt).

anvi-gen-variability-profile -p profile-db \ -c contigs-db \ --gene-caller-ids GENE_1,GENE_2,GENE_3

When providing a structure-db, you can also limit your analysis to only genes that have structures in your database.

anvi-gen-variability-profile -p profile-db \ -c contigs-db \ -s structure-db \ -C collection \ --only-if-structure

You can also choose to look at only data from specific samples by providing a file with one sample name per line. For example

anvi-gen-variability-profile -p profile-db \ -c contigs-db \ -C collection \ --samples-of-interest my_samples.txt

where my_samples.txt looks like this:

DAY_17A
DAY_18A
DAY_22A

SNVs vs. SCVs vs. SAAVs

Which one you’re analyzing depends entirely on the engine parameter, which you can set to NT (nucleotides), CDN (codons), or AA (amino acids). The default value is nucleotides. Note that to analyze SCVs or SAAVs, you’ll have needed to use the flag --profile-SCVs when you ran anvi-profile or anvi-merge.

For example, to analyze SAAVs, run

anvi-gen-variability-profile -p profile-db \ -c contigs-db \ -s structure-db \ --engine AA

When analyzing single codon variants, you can choose to skip computing synonymity to save on run time, as so:

anvi-gen-variability-profile -p profile-db \ -c contigs-db \ -s structure-db \ --engine CDN \ --skip-synonymity

Filtering the output

You can filter the output in various ways, so that you can get straight to the variability positions that you’re most interested in. Here are some of the filters that you can set:

  • The maximum number of variable positions that can come from a single split (e.g. to look at a max of only two random SCVs from each split)
  • The maximum and minimum departure from the reference or consensus position
  • The minimum coverage value in all samples (if a position is covered less than that value in a even single sample, it will not be reported)

Adding additional information

You can also set --quince-mode, which reports the variability data across all samples for each position reported (even if that position isn’t variable in some samples). For example, if nucleotide position 34 of contig 1 was a SNV in one sample, the output would contain the data for nucleotide position 34 for all of your samples.

You can also ask the program to report the contig names, split names, and gene-level coverage statistics.

Edit this file to update this information.

Additional Resources

Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__ tag in this file to see an example.