anvi-analyze-synteny [program]
Extract ngrams, as in 'co-occurring genes in synteny', from genomes.
See program help menu or go back to the main page of anvi’o programs and artifacts.
Table of Contents
Can provide
Can consume
genomes-storage-db functions
pan-db
Usage
Briefly, anvi-analyze-synteny counts ngrams by converting contigs into strings of annotations for a given user-defined source of gene annotation. A source annotation for functions must be provided to create ngrams, upon which anvi’o will use a sliding window of size N
to deconstruct the loci of interest into ngrams and count their frequencies.
Run for a given function annotation source
anvi-analyze-synteny -g genomes-storage-db \ --annotation-source functions \ --ngram-window-range 2:3 \ -o ngrams
For instance, if you have run anvi-run-ncbi-cogs on each contigs-db you have used to generate your genomes-storage-db, your --annotation-source
can be NCBI_COGS
:
anvi-analyze-synteny -g genomes-storage-db \ --annotation-source NCBI_COGS \ --ngram-window-range 2:3 \ -o ngrams
Handling genes with unknown functions
By default, anvi-analyze-synteny will ignore genes with unknown functions based on the annotation source of interest. However, this can be circumvented either by providing a pan-db, so the program would use gene cluster identities as function names:
anvi-analyze-synteny -g genomes-storage-db \ -p pan-db \ --ngram-window-range 2:3 \ -o ngrams
or by explicitly asking the program to consider unknown functions, in which case the program would not discard ngrams that include genes without functions:
anvi-analyze-synteny -g genomes-storage-db \ --annotation-source functions \ --ngram-window-range 2:3 \ -o ngrams \ --analyze-unknown-functions
The disadvantage of the latter strategy is that since all genes with unknown functions will be considered the same, the frequency of ngrams that contain genes with unknown functions may be inflated in your final results.
Run with multiple annotations
If multiple gene annotation sources are provided (i.e., a pangenome for gene clusters identities as well as a functional annotation source), the user must define which annotation source will be used to create the ngrams using the parameter --ngram-source
. The resulting ngrams will then be re-annotated with the second annotation source and also reported.
anvi-analyze-synteny -g genomes-storage-db \ -p pan-db \ --annotation-source functions \ --ngram-source gene_clusters \ --ngram-window-range 2:3 \ -o ngrams
Test cases for developers
If you are following the anvi’o master branch on your computer, you can create a test case for this program.
First, go to your source code directory. Then run the following commands:
cd anvio/anvio/tests
./run_all_tests.sh
# set output dir
output_dir=sandbox/test-output
# make a external-genomesfile
echo -e "name\tcontigs_db_path\ng01\t$output_dir/01.db\ng02\t$output_dir/02.db\ng03\t$output_dir/03.db" > $output_dir/external-genomes-file.txt
Run one or more alternative scenarios and check output files:
anvi-analyze-synteny -e $output_dir/external-genomes-file.txt \
--annotation-source COG_FUNCTION \
--window-range 2:3 \
-o $output_dir/synteny_output_no_unknowns.tsv
anvi-analyze-synteny -e $output_dir/external-genomes-file.txt \
--annotation-source COG_FUNCTION \
--window-range 2:3 \
-o $output_dir/synteny_output_with_unknowns.tsv \
--analyze-unknown-functions
anvi-analyze-synteny -e $output_dir/external-genomes-cps.txt \
--annotation-source COG_FUNCTION \
--window-range 2:3 \
-o $output_dir/tsv.txt \
--analyze-unknown-functions
Edit this file to update this information.
Additional Resources
Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__
tag in this file to see an example.