profile-db [artifact]

A DB-type anvi’o artifact. This artifact is typically generated, used, and/or exported by anvi’o (and not provided by the user)..

An anvi’o database that contains key information about the mapping of short reads from multiple samples to your contigs.

You can think of this as a extension of a contigs-db that contains information about how your contigs align with each of your samples. The vast majority of programs that use a profile database will also ask for the contigs database associated with it.

A profile database contains information about how short reads map to the contigs in a contigs-db. Specificially, for each sample, a profile database contains

  • the coverage and abundance per nucleotide position for each contig
  • variants of various kinds (single-nucleotide, single-codon, and single-amino acid)
  • structural variants (ex. insertions and deletions) These terms are explained on the anvi’o vocabulary page

This information is neccessary to run anvi’o programs like anvi-cluster-contigs, anvi-estimate-metabolism, and anvi-gen-gene-level-stats-databases. You can also interact with a profile database using programs like anvi-interactive.

Technically, “profile-db” refers to a profile database that contains the data from several samples – in other words, the result of running anvi-merge on several single-profile-db. However, since a single-profile-db has a lot of the functionality of a profile-db, it might be easier to think of a profile database as a header referring to both single-profile-dbs and profile-dbs (which can also be called a merged-profile-dbs). For simplicity sake, since most users are dealing with multiple samples, the name was shortened to just profile-db. The following are a list of differences in functionality between a single profile database and a merged profile database:

How to make a profile database

If you have multiple samples

  1. Prepare your contigs-db
  2. Run anvi-profile with an appropriate bam-file. The output of this will give you a single-profile-db. You will need to do this for each of your samples, which have been converted into a bam-file with your short reads.
  3. Run anvi-merge on your contigs-db (from step 1) and your single-profile-dbs (from step 2). The output of this is a profile-db.

If you have a single sample

  1. Prepare your contigs-db
  2. Run anvi-profile with an appropriate bam-file. The output of this will give you a single-profile-db. You can see that page for more information, but essentially you can use a single-profile-db instead of a profile database to run most anvi’o functions.

