Microbial 'omics


Brought to you by

anvi-gen-structure-database [program]

Identifies genes in your contigs database that encode proteins that are homologous to proteins with solved structures. If sufficiently similar homologs are identified, they are used as structural templates to predict the 3D structure of proteins in your contigs database.

See program help menu or go back to the main page of anvi’o programs and artifacts.

Table of Contents

Can provide

structure-db

Can consume

contigs-db pdb-db

Usage

This program attempts to solve for the 3D structures of proteins encoded by genes in your contigs-db using DIAMOND and MODELLER.

DIAMOND first searches your sequence(s) against a database of proteins with a known structure. This database is downloaded from the Sali lab, who created and maintain MODELLER, and contains all of the PDB sequences clustered at 95% identity.

If any good hits are found, they are selected as templates, and their structures are nabbed either from the RCSB directly, or from a local pdb-db database which you can create yourself with anvi-setup-pdb-database. Then, anvi’o passes control over to MODELLER, which creates a 3D alignment for your sequence to the template structures, and makes final adjustments to it based off of empirical distributions of bond angles. For more information, check this blogpost.

The output of this program is a structure-db, which contains all of the modelled structures. Currently, the primary use of the structure-db is for interactive exploration with anvi-display-structure. You can also export your structures into external .pdb files with anvi-export-structures, or incorporate structural information in the variability-profile-txt with anvi-gen-variability-profile.

Basic run

Here is a simple run:

anvi-gen-structure-database -c contigs-db \ --gene-caller-ids 1,2,3 \ -o STRUCTURE.db

Following this, you will have the structures for genes 1, 2, and 3 stored in STRUCTURE.db, assuming reasonable templates were found. Alternatively, you can provide a file name with the gene caller IDs (one ID per line) with the flag --genes-of-interest.

If you have already run anvi-setup-pdb-database and therefore have a local copy of representative PDB structures, make sure you use it by providing the --offline flag. If you put it in a non-default location, provide the path to your pdb-db:

anvi-gen-structure-database -c contigs-db \ --gene-caller-ids 1,2,3 \ --pdb-database pdb-db \ -o STRUCTURE.db

To quickly get a very rough estimate for your structures, you can run with the flag --very-fast.

Advanced Parameters

Here, we will go through a brief overview of the MODELLER parameters that you are able to change. See this page for more information.

  • The number of models to be simulated. The default is 1.
  • The standard deviation of atomic perturbation of the initial structure (i.e. how much you change the position of the atoms before fine tuning with other analysis). The default is 4 angstroms.
  • The MODELLER database used. The default is pdb_95, which can be found here. This is the same database that is downloaded by anvi-setup-pdb-database.
  • The scoring function used to compare potential models. The default is DOPE_score.
  • The minimum percent identity cutoff for a template to be further considered.
  • The minimum alignment fraction that the sequence is covered by the template in order to be further considered.
  • The maximum number of templates that the program will consider. The default is 5.
  • The MODELLER program to use. The default is mod9.19, but anvi’o is somewhat intelligent and will look for the most recent version it can find.

For a case study on how some of these parameters matter, see here.

You also have the option to

  • Skip the use of DSSP, which predicts beta sheets, alpha helices, certain bond angles, and relative solvent acessibility of residues.
  • Output all the raw data, just provide a path to the desired directory with the flag --dump-dir.

Edit this file to update this information.

Additional Resources

Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__ tag in this file to see an example.