This program deals with populating tables that store HMM hits in an anvi'o contigs database.
HMMs in the context of anvi’o
In a nutshell, hidden Markov models are statistical models typically generated from known genes which enable ‘searching’ for similar genes in other sequence contexts.
The default anvi’o distribution includes numerous curated HMM profiles for single-copy core genes and ribosomal RNAs, and anvi’o can work with custom HMM profiles provided by the user. In anvi’o lingo, each of these HMM profiles, whether they are built-in or user defined, is called an hmm-source.
anvi-run-hmms -c contigs-db
Multithreading will dramatically improve the performance of
anvi-run-hmms. If you have multiple CPUs or cores, you may parallelize your search:
anvi-run-hmms -c contigs-db \ --num-threads 6
You can also run this program on a specific built-in hmm-source:
anvi-run-hmms -c contigs-db \ -I Bacteria_71
anvi-run-hmms with a custom model is easy. All you need to do is to create a directory with necessary files:
anvi-run-hmms -c contigs-db \ -H MY_HMM_PROFILE
See the relevant section in the artifact hmm-source for details.
Changing the HMMER program
anvi-run-hmms will use HMMER’s
hmmscan for amino acid HMM profiles, but you can use
hmmsearch if you are searching a very large number of models against a relatively smaller number of sequences:
anvi-run-hmms -c contigs-db \ --hmmer-program hmmsearch
This flag has no effect when your HMM profile source is for nucleotide sequences (like any of the Ribosomal RNA sources). In those cases anvi’o will use
Saving the HMMER output
If you want to see the output from the HMMER program (eg,
hmmscan) used to annotate your data, you can request that it be saved in a directory of your choosing. Please note that this only works when you are running on a single HMM source, as in the example below:
anvi-run-hmms -c contigs-db \ -I Bacteria_71 \ --hmmer-output-dir OUTPUT_DIR
If you do this, file(s) with the prefix
hmm will appear in that directory, with the file extension indicating the format of the output file. For example, the table output format would be called
These resulting files are not exactly the raw output of HMMER because anvi’o does quite a bit of pre-processing on the raw input and output file(s) while jumping through some hoops to make the HMM searches multi-threaded. If this is causing you a lot of headache, please let us know.
Requesting domain table output
No matter what, anvi’o will use the regular table output to annotate your contigs database. However, if you are using the –hmmer-output-dir to store the HMMER output, you can also request a domain table output using the flag
anvi-run-hmms -c contigs-db \ -I Bacteria_71 \ --hmmer-output-dir OUTPUT_DIR \ --get-domtable-output`
In this case anvi’o will run HMMER using the
--domtblout flag to generate this output file.
This flag will only work with HMM profiles made for amino acid sequences. Profiles for nucleotide sequences require the use of the program
nhmmscan, which does not have an option to store domain output.
Please note that this output won’t be used to filter hits to be added to the contigs database. But it will give you the necessary output file to investigate the coverage of HMM hits. But you can use the program anvi-script-filter-hmm-hits-table with this file to remove weak hits from your HMM hits table later.
Other things anvi-run-hmms can do
- Add the tag
--also-scan-trnasto basically run anvi-scan-trnas for you at the same time. It’s very convenient. (But it only works if you are not using the
-Hflags at the same time because reasons.)
Edit this file to update this information.
Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the
__resources__ tag in this file to see an example.