Extract ngrams, as in 'co-occurring genes in synteny', from genomes.
Requires or uses
Briefly, anvi-analyze-synteny counts ngrams by converting contigs into strings of annotations for a given user-defined source of gene annotation. A source annotation for functions must be provided to create ngrams, upon which anvi’o will use a sliding window of size
N to deconstruct the loci of interest into ngrams and count their frequencies.
Run for a given function annotation source
Handling genes with unknown functions
By default, anvi-analyze-synteny will ignore genes with unknown functions based on the annotation source of interest. However, this can be circumvented either by providing a pan-db, so the program would use gene cluster identities as function names:
or by explicitly asking the program to consider unknown functions, in which case the program would not discard ngrams that include genes without functions:
The disadvantage of the latter strategy is that since all genes with unknown functions will be considered the same, the frequency of ngrams that contain genes with unknown functions may be inflated in your final results.
Run with multiple annotations
If multiple gene annotation sources are provided (i.e., a pangenome for gene clusters identities as well as a functional annotation source), the user must define which annotation source will be used to create the ngrams using the parameter
--ngram-source. The resulting ngrams will then be re-annotated with the second annotation source and also reported.
Test cases for developers
If you are following the anvi’o master branch on your computer, you can create a test case for this program.
First, go to your source code directory. Then run the following commands:
cd anvio/anvio/tests ./run_all_tests.sh # set output dir output_dir=sandbox/test-output # make a external-genomesfile echo -e "name\tcontigs_db_path\ng01\t$output_dir/01.db\ng02\t$output_dir/02.db\ng03\t$output_dir/03.db" > $output_dir/external-genomes-file.txt
Run one or more alternative scenarios and check output files:
anvi-analyze-synteny -e $output_dir/external-genomes-file.txt \ --annotation-source COG_FUNCTION \ --window-range 2:3 \ -o $output_dir/synteny_output_no_unknowns.tsv anvi-analyze-synteny -e $output_dir/external-genomes-file.txt \ --annotation-source COG_FUNCTION \ --window-range 2:3 \ -o $output_dir/synteny_output_with_unknowns.tsv \ --analyze-unknown-functions anvi-analyze-synteny -e $output_dir/external-genomes-cps.txt \ --annotation-source COG_FUNCTION \ --window-range 2:3 \ -o $output_dir/tsv.txt \ --analyze-unknown-functions
Edit this file to update this information.
Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the
__resources__ tag in this file to see an example.