Microbial 'omics

Brought to you by

# genome-similarity [artifact]

A CONCEPT-type anvi’o artifact. This artifact is typically generated, used, and/or exported by anvi’o (and not provided by the user)..

🔙 To the main page of anvi’o programs and artifacts.

## Required or used by

anvi-dereplicate-genomes

## Description

This is the output of anvi-compute-genome-similarity (which describes the level of similarity between all of the input genomes) or anvi-script-compute-ani-for-fasta (which describes the level of similarity between contigs in a fasta file).

The output of anvi-compute-genome-similarity will only be in this structure if you did not input a pan-db. Otherwise, the data will be put directly into the additional data tables of the pan-db. The same is true of anvi-script-compute-ani-for-fasta.

This is a directory (named by the user) that contains both a dendrogram (NEWICK-tree) and a matrix of the similarity scores between each pair for a variety of metrics dependent on the program that you used to run anvi-compute-genome-similarity or anvi-script-compute-ani-for-fasta .

For example, if you used pyANI’s ANIb (the default program), the output directory will contain the following twelve files. These are directly created from the heatmaps generated by PyANI, just converted into matrices and newick files:

-ANIb_alignment_coverage.newick and ANIb_alignment_coverage.txt: contains the percent coverage (for query and subject)

-ANIb_percentage_identity.newick and ANIb_percentage_identity.txt: contains the percent identity

-ANIb_full_percentage_identity.newick and ANIb_full_percentage_identity.txt: contains the percent identity in the context of the length of the entire query and subject sequences (not just the aligned segment)

-ANIb_alignment_lengths.newick and ANIb_alignment_lengths.txt: contians the total aligned lengths

-ANIb_similarity_errors.newick and ANIb_similarity_errors.txt: contains similarity errors (total number of mismatches, not including indels)

-ANIb_hadamard.newick and ANIb_hadamard.txt: contians the hadamard matrix (dot product of identity and coverage matrices)

Edit this file to update this information.