anvi-get-sequences-for-hmm-hits [program]
Get sequences for HMM hits from many inputs.
See program help menu or go back to the main page of anvi’o programs and artifacts.
Table of Contents
- Can provide
- Can consume
- Usage
- Learn available HMM sources
- Get all sequences in a given HMM source
- Learn available genes in a given HMM source
- Get sequences for some sequences in a given HMM source
- Get HMM hits in bins of a collection
- Get amino acid sequences for HMM hits
- Get HMM hits independently aligned and concatenated
- Want to play?
- Learn available HMM sources
- Additional Resources
Can provide
genes-fasta concatenated-gene-alignment-fasta
Can consume
contigs-db profile-db
external-genomes
internal-genomes
hmm-source
hmm-hits
Usage
This program can work with anvi’o contigs-db, external-genomes, or internal-genomes files to return sequences for HMM hits identified through the default anvi’o hmm-sources (such as the domain-specific single-copy core genes) or user-defined hmm-sources (such as HMMs for specific antibiotic resistance gene families or any other targets).
Using it with single-copy core genes in default anvi’o HMMs make it a very versatile tool for phylogenomics as the user can define specific sets of genes to be aligned and concatenated.
Learn available HMM sources
anvi-get-sequences-for-hmm-hits -c contigs-db \ --list-hmm-sources
AVAILABLE HMM SOURCES =============================================== * ‘Bacteria_71’ (type: singlecopy; num genes: 71) * ‘Archaea_76’ (type: singlecopy; num genes: 76) * ‘Protista_83’ (type: singlecopy; num genes: 83) * ‘Ribosomal_RNAs’ (type: Ribosomal_RNAs; num genes: 12)
Get all sequences in a given HMM source
anvi-get-sequences-for-hmm-hits -c contigs-db \ --hmm-source Bacteria_71 \ -o genes-fasta
Learn available genes in a given HMM source
Please note that the flag --list-available-gene-names
will give you the list of genes in an HMM collection (for example, for Bacteria_71
in the following use case), and it will not give you the list of genes in your genomes or metagenomes that are matching to them. You can generate a table of HMMs across your genomes or metagenomes with another program, anvi-script-gen-hmm-hits-matrix-across-genomes.
anvi-get-sequences-for-hmm-hits -c contigs-db \ --hmm-source Bacteria_71 \ --list-available-gene-names
* Bacteria_71 [type: singlecopy]: ADK, AICARFT_IMPCHas, ATP-synt, ATP-synt_A, Chorismate_synt, EF_TS, Exonuc_VII_L, GrpE, Ham1p_like, IPPT, OSCP, PGK, Pept_tRNA_hydro, RBFA, RNA_pol_L, RNA_pol_Rpb6, RRF, RecO_C, Ribonuclease_P, Ribosom_S12_S23, Ribosomal_L1, Ribosomal_L13, Ribosomal_L14, Ribosomal_L16, Ribosomal_L17, Ribosomal_L18p, Ribosomal_L19, Ribosomal_L2, Ribosomal_L20, Ribosomal_L21p, Ribosomal_L22, Ribosomal_L23, Ribosomal_L27, Ribosomal_L27A, Ribosomal_L28, Ribosomal_L29, Ribosomal_L3, Ribosomal_L32p, Ribosomal_L35p, Ribosomal_L4, Ribosomal_L5, Ribosomal_L6, Ribosomal_L9_C, Ribosomal_S10, Ribosomal_S11, Ribosomal_S13, Ribosomal_S15, Ribosomal_S16, Ribosomal_S17, Ribosomal_S19, Ribosomal_S2, Ribosomal_S20p, Ribosomal_S3_C, Ribosomal_S6, Ribosomal_S7, Ribosomal_S8, Ribosomal_S9, RsfS, RuvX, SecE, SecG, SecY, SmpB, TsaE, UPF0054, YajC, eIF-1a, ribosomal_L24, tRNA-synt_1d, tRNA_m1G_MT, Adenylsucc_synt
Get sequences for some sequences in a given HMM source
anvi-get-sequences-for-hmm-hits -c contigs-db \ --hmm-source Bacteria_71 \ --gene-names Ribosomal_L27,Ribosomal_L28,Ribosomal_L3 \ -o genes-fasta
Get HMM hits in bins of a collection
anvi-get-sequences-for-hmm-hits -c contigs-db \ -p profile-db \ -C collection --hmm-source Bacteria_71 \ --gene-names Ribosomal_L27,Ribosomal_L28,Ribosomal_L3 \ -o genes-fasta
Get amino acid sequences for HMM hits
anvi-get-sequences-for-hmm-hits -c contigs-db \ -p profile-db \ -C collection --hmm-source Bacteria_71 \ --gene-names Ribosomal_L27,Ribosomal_L28,Ribosomal_L3 \ --get-aa-sequences \ -o genes-fasta
Get HMM hits independently aligned and concatenated
The resulting file can be used for phylogenomics analyses via anvi-gen-phylogenomic-tree or through more sophisticated tools for curating alignments and computing trees.
anvi-get-sequences-for-hmm-hits -c contigs-db \ -p profile-db \ -C collection --hmm-source Bacteria_71 \ --gene-names Ribosomal_L27,Ribosomal_L28,Ribosomal_L3 \ --get-aa-sequences \ --concatenate-genes \ --return-best-hit -o genes-fasta
Want to play?
You can play with this program using the anvi’o data pack for the infant gut data and by replacing the parameters above with appropriate ones in the following commands.
Download the latest version of the data from here:
doi:10.6084/m9.figshare.3502445
Unpack it:
tar -zxvf INFANTGUTTUTORIAL.tar.gz && cd INFANT-GUT-TUTORIAL
Import the collection merens
:
anvi-import-collection additional-files/collections/merens.txt \ -p PROFILE.db \ -c CONTIGS.db \ -C merens
Then run the program to
Learn available HMM sources
anvi-get-sequences-for-hmm-hits -p PROFILE.db \ -c CONTIGS.db \ -C merens \ -o OUTPUT.fa \ --hmm-source Campbell_et_al \ --gene-names Ribosomal_L27,Ribosomal_L28,Ribosomal_L3 \ --return-best-hit \ --get-aa-sequences \ --concatenate
Edit this file to update this information.
Additional Resources
Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit this file on GitHub. If you are not sure how to do that, find the __resources__
tag in this file to see an example.