Microbial 'omics

Brought to you by

# profile-db [artifact]

A DB-type anvi’o artifact. This artifact is typically generated, used, and/or exported by anvi’o (and not provided by the user)..

🔙 To the main page of anvi’o programs and artifacts.

anvi-merge

## Description

An anvi’o database that contains key information about the mapping of short reads from multiple samples to your contigs.

You can think of this as a extension of a contigs-db that contains information about how your contigs align with each of your samples. The vast majority of programs that use a profile database will also ask for the contigs database associated with it.

A profile database contains information about how short reads map to the contigs in a contigs-db. Specificially, for each sample, a profile database contains

• the coverage and abundance per nucleotide position for each contig
• variants of various kinds (single-nucleotide, single-codon, and single-amino acid)
• structural variants (ex. insertions and deletions) These terms are explained on the anvi’o vocabulary page

This information is neccessary to run anvi’o programs like anvi-cluster-contigs, anvi-estimate-metabolism, and anvi-gen-gene-level-stats-databases. You can also interact with a profile database using programs like anvi-interactive.

Technically, “profile-db” refers to a profile database that contains the data from several samples – in other words, the result of running anvi-merge on several single-profile-db. However, since a single-profile-db has a lot of the functionality of a profile-db, it might be easier to think of a profile database as a header referring to both single-profile-dbs and profile-dbs (which can also be called a merged-profile-dbs). For simplicity’s sake, since most users are dealing with multiple samples, the name was shortened to just profile-db. The following are a list of differences in functionality between a single profile database and a merged profile database:

## How to make a profile database

### If you have multiple samples

Profile databases, like contigs-dbs, are allowed to have different variants, though the only currently implemented variant, the trnaseq-profile-db, is for tRNA transcripts from tRNA-seq experiments. The default variant stored for “standard” profile databases is unknown. Variants should indicate that substantially different information is stored in the database. For instance, single codon variability is applicable to protein-coding genes but not tRNA transcripts, so SCV data is not recorded for the trnaseq variant. The \$(trnaseq-workflow)s generates trnaseq-profile-dbs using a very different approach to anvi-profile.