Microbial 'omics

All programs and scripts in anvi'o

Questions? Concerns? Find us on

The latest version of anvi’o is v4. See the release notes.

Table of Contents

Here you will find the current anvi’o programs in the latest stable version of the platform, and their help menu. The contents of this file was last updated on 24 Feb 18 12:58:16, and then anvi’o looked like this:

Key Value
Anvi'o version 4
Profile DB version 23
Contigs DB version 10
Pan DB version 8
Genome data storage version 6
Auxiliary data storage version 2

Summary

Main anvi’o programs (68) anvi-cluster-with-concoct, anvi-compute-completeness, anvi-delete-collection, anvi-delete-hmms, anvi-delete-misc-data, anvi-delete-state, anvi-display-contigs-stats, anvi-display-pan, anvi-experimental-organization, anvi-export-collection, anvi-export-contigs, anvi-export-functions, anvi-export-gene-calls, anvi-export-gene-coverage-and-detection, anvi-export-locus, anvi-export-misc-data, anvi-export-splits-and-coverages, anvi-export-splits-taxonomy, anvi-export-state, anvi-export-table, anvi-gen-contigs-database, anvi-gen-gene-consensus-sequences, anvi-gen-genomes-storage, anvi-gen-network, anvi-gen-phylogenomic-tree, anvi-gen-variability-matrix, anvi-gen-variability-network, anvi-gen-variability-profile, anvi-get-aa-counts, anvi-get-aa-frequencies, anvi-get-aa-sequences-for-gene-calls, anvi-get-dna-sequences-for-gene-calls, anvi-get-sequences-for-gene-clusters, anvi-get-sequences-for-hmm-hits, anvi-get-short-reads-from-bam, anvi-get-short-reads-mapping-to-a-gene, anvi-get-split-coverages, anvi-import-collection, anvi-import-functions, anvi-import-misc-data, anvi-import-state, anvi-import-taxonomy, anvi-init-bam, anvi-interactive, anvi-matrix-to-newick, anvi-mcg-classifier, anvi-merge, anvi-merge-bins, anvi-meta-pan-genome, anvi-migrate-db, anvi-oligotype-linkmers, anvi-pan-genome, anvi-profile, anvi-push, anvi-refine, anvi-rename-bins, anvi-report-linkmers, anvi-run-hmms, anvi-run-ncbi-cogs, anvi-saavs-and-protein-structures-summary, anvi-search-functions, anvi-self-test, anvi-setup-ncbi-cogs, anvi-show-collections-and-bins, anvi-show-misc-data, anvi-split, anvi-summarize, anvi-update-db-description.

Ad hoc anvi’o scripts (18) anvi-script-add-default-collection, anvi-script-checkm-tree-to-interactive, anvi-script-filter-fasta-by-blast, anvi-script-gen-CPR-classifier, anvi-script-gen-distribution-of-genes-in-a-bin, anvi-script-gen-short-reads, anvi-script-gen-vignette, anvi-script-gen_stats_for_single_copy_genes.py, anvi-script-gene-clusters-to-gene-calls, anvi-script-get-collection-info, anvi-script-get-collections-as-tab-delimited-matrix.py, anvi-script-get-prot-sequences.py, anvi-script-itep-to-data-txt, anvi-script-merge-collections, anvi-script-predict-CPR-genomes, anvi-script-reformat-fasta, anvi-script-run-eggnog-mapper, anvi-script-snvs-to-interactive.


Programs

Please let us know if there is something unclear in this output.

anvi-cluster-with-concoct

A program to cluster items in a merged anvi'o profile using CONCOCT, and optionally creating a collection in the profile database. This is especially useful if you need to have more control over the number of clusters to work with if you are planning to refine them manually later.

profile_db clustering collections

Example uses and other resources

Usage

anvi-cluster-with-concoct [-h] -p PROFILE_DB -c CONTIGS_DB
                          [-o FILE_PATH] [--skip-store-in-db]
                          [-C COLLECTION_NAME]
                          [--num-clusters-requested INT]

Parameters

optional arguments:

  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.
  --skip-store-in-db    By default, analysis results are stored in the profile
                        database. The use of this flag will let you skip that
  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        Collection name.
  --num-clusters-requested INT
                        How many clusters do you request? Default is 400.

anvi-compute-completeness

A script to generate completeness info for a given list of splits

Usage

anvi-compute-completeness [-h] [--splits-of-interest FILE] -c
                          CONTIGS_DB [-e E-VALUE]
                          [--list-completeness-sources]
                          [--completeness-source NAME]

Parameters

optional arguments:

  --splits-of-interest FILE
                        A file with split names. There should be only one
                        column in the file, and each line should correspond to
                        a unique split name.
  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -e E-VALUE, --min-e-value E-VALUE
                        Minimum significance score of an HMM find to be
                        considered as a valid hit. Default is 1e-15.
  --list-completeness-sources
                        Show available sources and exit.
  --completeness-source NAME
                        Single-copy gene source to use to estimate
                        completeness.

anvi-delete-collection

Remove a collection from a given profile database.

Usage

anvi-delete-collection [-h] -p PROFILE_DB [-C COLLECTION_NAME]
                       [--list-collections]

Parameters

optional arguments:

  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        Collection name.
  --list-collections    Show available collections and exit.

anvi-delete-hmms

Remove HMM hits from an anvi'o contigs database.

Usage

anvi-delete-hmms [-h] -c CONTIGS_DB [--hmm-source SOURCE NAME] [-l]
                 [--just-do-it]

Parameters

optional arguments:

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  --hmm-source SOURCE NAME
                        Use a specific HMM source. You can use '--list-hmm-
                        sources' flag to see a list of available resources.
                        The default is 'None'.
  -l, --list-hmm-sources
                        List available HMM sources in the contigs database and
                        quit.
  --just-do-it          Don't bother me with questions or warnings, just do
                        it.

anvi-delete-misc-data

Remove stuff from additional data or order tables in pan or profile databases for items or layers

Usage

anvi-delete-misc-data [-h] -p PAN_OR_PROFILE_DB -t NAME
                      [--keys-to-remove KEYS_TO_REMOVE]
                      [--list-available-keys] [--just-do-it]

Parameters

optional arguments:

  -p PAN_OR_PROFILE_DB, --pan-or-profile-db PAN_OR_PROFILE_DB
                        Anvi'o pan or profile database
  -t NAME, --target-data-table NAME
                        The target table is the table you are interested in
                        accessing. Currently it can be 'items','layers', or
                        'layer_orders'. Please see most up-to-date online
                        documentation for more information.
  --keys-to-remove KEYS_TO_REMOVE
                        A comma-separated list of data keys to remove from the
                        database. If you do not use this parameter, anvi'o
                        will simply remove everything from the target data
                        table immediately.
  --list-available-keys
                        Using this flag will list available data keys in the
                        target data table and quit without doing anything
                        else.
  --just-do-it          Don't bother me with questions or warnings, just do
                        it.

anvi-delete-state

Delete an anvi'o state from a pan or profile database.

Usage

anvi-delete-state [-h] -p PAN_OR_PROFILE_DB [-s STATE_NAME]
                  [--list-states]

Parameters

optional arguments:

  -p PAN_OR_PROFILE_DB, --pan-or-profile-db PAN_OR_PROFILE_DB
                        Anvi'o pan or profile database
  -s STATE_NAME, --state STATE_NAME
                        The state name to ... delete :(
  --list-states         Show available states and exit.

anvi-display-contigs-stats

Start the anvi'o interactive interactive for viewing or comparing contigs statistics

Usage

anvi-display-contigs-stats [-h] [--report-as-text] [-o FILE_PATH]
                           [-I IP_ADDR] [-P INT] [--browser-path PATH]
                           [--server-only]
                           CONTIG DATABASES) [CONTIG DATABASE(S ...]

Parameters

positional arguments:

  CONTIG DATABASE(S)    Anvio'o Contig databases to display statistics, you
                        can give multiple databases by seperating them with
                        space.

REPORT CONFIGURATION: Specify what kind of output you want.

  --report-as-text      If you give this flag, Anvi'o will not open new
                        browser to show Contigs database statistics and write
                        all stats to TAB separated file and you should also
                        give --output-file with this flag otherwise Anvi'o
                        will complain.
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.

SERVER CONFIGURATION: For power users.

  -I IP_ADDR, --ip-address IP_ADDR
                        IP address for the HTTP server. The default ip address
                        (0.0.0.0) should work just fine for most.
  -P INT, --port-number INT
                        Port number to use for anvi'o services. If nothing is
                        declared, anvi'o will try to find a suitable port
                        number, starting from the default port number, 8080.
  --browser-path PATH   By default, anvi'o will use your default browser to
                        launch the interactive interface. If you would like to
                        use something else than your system default, you can
                        provide a full path for an alternative browser using
                        this parameter, and hope for the best. For instance we
                        are using this parameter to call Google's experimental
                        browser, Canary, which performs better with demanding
                        visualizations.
  --server-only         The default behavior is to start the local server, and
                        fire up a browser that connects to the server. If you
                        have other plans, and want to start the server without
                        calling the browser, this is the flag you need.

anvi-display-pan

Start an anvi'o server to display a pan-genome

Usage

anvi-display-pan [-h] -p PAN_DB [-g GENOMES_STORAGE] [-d VIEW_DATA]
                 [-t NEWICK] [-V ADDITIONAL_VIEW]
                 [-A ADDITIONAL_LAYERS] [--view NAME] [--title NAME]
                 [--state-autoload NAME] [--collection-autoload NAME]
                 [--export-svg FILE_PATH] [--skip-init-functions]
                 [--show-views] [--dry-run] [--show-states]
                 [--list-collections] [--skip-auto-ordering]
                 [-I IP_ADDR] [-P INT] [--browser-path PATH]
                 [--read-only] [--server-only]

Parameters

INPUT FILES: Input files from the pangenome analysis.

  -p PAN_DB, --pan-db PAN_DB
                        Anvi'o pan database
  -g GENOMES_STORAGE, --genomes-storage GENOMES_STORAGE
                        Anvi'o genomes storage file

OPTIONAL INPUTS: Where the yay factor becomes a reality.

  -d VIEW_DATA, --view-data VIEW_DATA
                        A TAB-delimited file for view data
  -t NEWICK, --tree NEWICK
                        NEWICK formatted tree structure

ADDITIONAL STUFF: Parameters to provide additional layers, views, or layer data.

  -V ADDITIONAL_VIEW, --additional-view ADDITIONAL_VIEW
                        A TAB-delimited file for an additional view to be used
                        in the interface. This file file should contain all
                        split names, and values for each of them in all
                        samples. Each column in this file must correspond to a
                        sample name. Content of this file will be called
                        'user_vuew', which will be available as a new item in
                        the 'views' combo box in the interface
  -A ADDITIONAL_LAYERS, --additional-layers ADDITIONAL_LAYERS
                        A TAB-delimited file for additional layers for splits.
                        The first column of this file must be split names, and
                        the remaining columns should be unique attributes. The
                        file does not need to contain all split names, or
                        values for each split in every column. Anvi'o will try
                        to deal with missing data nicely. Each column in this
                        file will be visualized as a new layer in the tree.

VISUALS RELATED: Parameters that give access to various adjustements regarding the interface.

  --view NAME           Start the interface with a pre-selected view. To see a
                        list of available views, use --show-views flag.
  --title NAME          Title for the interface. If you are working with a
                        RUNINFO dict, the title will be determined based on
                        information stored in that file. Regardless, you can
                        override that value using this parameter. If you are
                        not using a anvio RUNINFO dictionary, a meaningful
                        title will appear in the interface only if you define
                        one using this parameter.
  --state-autoload NAME
                        Automatically load previous saved state and draw tree.
                        To see a list of available states, use --show-states
                        flag.
  --collection-autoload NAME
                        Automatically load a collection and draw tree. To see
                        a list of available collections, use --list-
                        collections flag.
  --export-svg FILE_PATH
                        The SVG output file path.

SWEET PARAMS OF CONVENIENCE: Parameters and flags that are not quite essential (but nice to have).

  --skip-init-functions
                        When declared, function calls for genes will not be
                        initialized (therefore will be missing from all
                        relevant interfaces or output files). The use of this
                        flag may reduce the memory fingerprint and processing
                        time for large datasets.
  --show-views          When declared, the program will show a list of
                        available views, and exit.
  --dry-run             Don't do anything real. Test everything, and stop
                        right before wherever the developer said 'well, this
                        is enough testing', and decided to print out results.
  --show-states         When declared the program will print all available
                        states and exit.
  --list-collections    Show available collections and exit.
  --skip-auto-ordering  When declared, the attempt to include automatically
                        generated orders of items based on additional data is
                        skipped. In case those buggers cause issues with your
                        data, and you still want to see your stuff and deal
                        with the other issue maybe later.

SERVER CONFIGURATION: For power users.

  -I IP_ADDR, --ip-address IP_ADDR
                        IP address for the HTTP server. The default ip address
                        (0.0.0.0) should work just fine for most.
  -P INT, --port-number INT
                        Port number to use for anvi'o services. If nothing is
                        declared, anvi'o will try to find a suitable port
                        number, starting from the default port number, 8080.
  --browser-path PATH   By default, anvi'o will use your default browser to
                        launch the interactive interface. If you would like to
                        use something else than your system default, you can
                        provide a full path for an alternative browser using
                        this parameter, and hope for the best. For instance we
                        are using this parameter to call Google's experimental
                        browser, Canary, which performs better with demanding
                        visualizations.
  --read-only           When the interactive interface is started with this
                        flag, all 'database write' operations will be
                        disabled.
  --server-only         The default behavior is to start the local server, and
                        fire up a browser that connects to the server. If you
                        have other plans, and want to start the server without
                        calling the browser, this is the flag you need.

anvi-experimental-organization

why yes we do stuff here.

Usage

anvi-experimental-organization [-h] [-p PROFILE_DB] -c CONTIGS_DB
                               [-i DIR_PATH] [-N NAME]
                               [--distance DISTANCE_METRIC]
                               [--linkage LINKAGE_METHOD]
                               [--skip-store-in-db] [-o FILE_PATH]
                               [--dry-run]
                               PATH

Parameters

positional arguments:

  PATH                  Config file for clustering of contigs. See
                        documentation for help.

optional arguments:

  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -i DIR_PATH, --input-directory DIR_PATH
                        Input directory where the input files addressed from
                        the configuration file can be found (i.e., the profile
                        database, if PROFILE.db::TABLE notation is used in the
                        configuration file).
  -N NAME, --name NAME  The name to use when storing the resulting clustering
                        in the database. This name will appear in the
                        interactive interface and other relevant interfaces.
                        Please consider using a short and descriptive single-
                        word (if you do not do that you will make anvi'o
                        complain).
  --distance DISTANCE_METRIC
                        The distance metric for the hierarchical clustering.
                        If you do not use this flag, the distance metric you
                        defined in your clustering config file will be used.
                        If you have not defined one in your config file, then
                        the system default will be used, which is "euclidean".
  --linkage LINKAGE_METHOD
                        Same story with the `--distance`, except, the system
                        default for this one is ward.
  --skip-store-in-db    By default, analysis results are stored in the profile
                        database. The use of this flag will let you skip that
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.
  --dry-run             Don't do anything real. Test everything, and stop
                        right before wherever the developer said 'well, this
                        is enough testing', and decided to print out results.

anvi-export-collection

Export a collection from an anvi'o database

Usage

anvi-export-collection [-h] -p PAN_OR_PROFILE_DB [-C COLLECTION_NAME]
                       [-O FILENAME_PREFIX] [--list-collections]
                       [--include-unbinned]

Parameters

optional arguments:

  -p PAN_OR_PROFILE_DB, --pan-or-profile-db PAN_OR_PROFILE_DB
                        Anvi'o pan or profile database
  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        Collection name.
  -O FILENAME_PREFIX, --output-file-prefix FILENAME_PREFIX
                        A prefix to be used while naming the output files (no
                        file type extensions please; just a prefix).
  --list-collections    Show available collections and exit.
  --include-unbinned    When this flag is used, anvi'o will also store in the
                        output file the items that do not appear in any of
                        your bins. This new bin will be called
                        'UNBINNED_ITEMS_BIN'. Yes. The ugly name is
                        intentional.

anvi-export-contigs

Export contigs (or splits) from an anvi'o contigs database

Usage

anvi-export-contigs [-h] -c CONTIGS_DB [--splits-mode] -o FILE_PATH

Parameters

optional arguments:

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  --splits-mode         Export split sequences instead.
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.

anvi-export-functions

Export gene function calls from an anvi'o contigs database

Usage

anvi-export-functions [-h] -c CONTIGS_DB [-o FILE_PATH]
                      [--annotation-sources SOURCE NAME[S]] [-l]

Parameters

optional arguments:

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.
  --annotation-sources SOURCE NAME[S]
                        Get functional annotations for a specific list of
                        annotation sources. You can specifiy one or more
                        sources by separating them from each other with a
                        comma character (i.e., '--annotation-sources
                        source_1,source_2,source_3'). The default behavior is
                        to return everything
  -l, --list-annotation-sources
                        List available sources for annotation in the contigs
                        database and quit.

anvi-export-gene-calls

Export gene calls from an anvi'o contigs database.

Usage

anvi-export-gene-calls [-h] -c CONTIGS_DB [-o FILE_PATH]

Parameters

optional arguments:

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.

anvi-export-gene-coverage-and-detection

Export gene coverage and detection data from

Usage

anvi-export-gene-coverage-and-detection [-h] -p PROFILE_DB -c
                                        CONTIGS_DB -O FILENAME_PREFIX

Parameters

optional arguments:

  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -O FILENAME_PREFIX, --output-file-prefix FILENAME_PREFIX
                        A prefix to be used while naming the output files (no
                        file type extensions please; just a prefix).

anvi-export-locus

Search for a function or HMM hit and for each matching gene get the sequence including a block of genes the around the match. The output will be written to a fasta file (or multiple files, see –separate-fasta option below. The headers of the sequences in the fasta file hold some information about the gene.)

Usage

anvi-export-locus [-h] -c CONTIGS_DB -n NUM_GENES [-s SEARCH_TERM]
                  [--gene-caller-ids GENE_CALLER_IDS]
                  [--delimiter CHAR] -O FILENAME_PREFIX
                  [--separate-fasta] [--use-hmm]
                  [--hmm-sources SOURCE NAME] [-l] [-W]
                  [--remove-partial-hits]

Parameters

Essential INPUT:

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -n NUM_GENES, --num-genes NUM_GENES
                        For each match (to the function, or HMM that was
                        searched) a sequence which includes a block of genes
                        will be saved. The block could include either genes
                        only in the forward direction of the gene (defined
                        according to the direction of transcription of the
                        gene) or reverse or both. If you wish to get both
                        direction use a comma (no spaces) to define the block
                        For example, "-n 4,5" will give you four genes before
                        and five genes after. Whereas, "-n 5" will give you
                        five genes after (in addition to the gene that
                        matched). To get only genes preceeding the match use
                        "-n 5,0". If the number of genes requested exceeds the
                        length of the contig, then the output will include the
                        sequence until the end of the contig.

Additional essential INPUT - OPTION 1: Search according to either HMM or functional annotations

  -s SEARCH_TERM, --search-term SEARCH_TERM
                        Search term.

Additional essential INPUT - OPTION 2: Search specific gene id's

  --gene-caller-ids GENE_CALLER_IDS
                        Gene caller ids. Multiple of them can be declared
                        separated by a delimiter (the default is a comma). If
                        you declare nothing, you may get everything. Or you
                        may get an error. Really depends on the situation.
                        Worth a try.
  --delimiter CHAR      The delimiter to parse multiple input terms. The
                        default is ','.

THE OUTPUT: Where should the output go. It will be one FASTA file with all matches or one FASTA per match (see –separate-fasta)

  -O FILENAME_PREFIX, --output-file-prefix FILENAME_PREFIX
                        A prefix to be used while naming the output files (no
                        file type extensions please; just a prefix).

ADDITIONAL STUFF: Flags and parameters you can set according to your need

  --separate-fasta      Split each match to a separate FASTA file.
  --use-hmm             Use HMM hits instead of functional annotations. If you
                        choose this option, you must also say which HMM source
                        to use.
  --hmm-sources SOURCE NAME
                        Get sequences for a specific list of HMM sources. You
                        can list one or more sources by separating them from
                        each other with a comma character (i.e., '--hmm-
                        sources source_1,source_2,source_3'). If you would
                        like to see a list of available sources in the contigs
                        database, run this program with '--list-hmm-sources'
                        flag.
  -l, --list-hmm-sources
                        List available HMM sources in the contigs database and
                        quit.
  -W, --overwrite-output-destinations
                        Overwrite if the output files and/or directories
                        exist.
  --remove-partial-hits
                        By default anvi'o will return hits even if they are
                        partial. Declaring this flag will make anvi'o filter
                        all hits that are partial. Partial hits are hits in
                        which you asked for n1 genes before and n2 genes after
                        the gene that matched the search criteria but the
                        search hits the end of the contig before finding the
                        number of genes that you asked.

anvi-export-misc-data

Export additional data or order tables in pan or profile databases for items or layers.

Usage

anvi-export-misc-data [-h] -p PAN_OR_PROFILE_DB -t NAME [-o FILE_PATH]

Parameters

optional arguments:

  -p PAN_OR_PROFILE_DB, --pan-or-profile-db PAN_OR_PROFILE_DB
                        Anvi'o pan or profile database
  -t NAME, --target-data-table NAME
                        The target table is the table you are interested in
                        accessing. Currently it can be 'items','layers', or
                        'layer_orders'. Please see most up-to-date online
                        documentation for more information.
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.

anvi-export-splits-and-coverages

Export sequences and coverages across samples for splits or contigs

Usage

anvi-export-splits-and-coverages [-h] -p PROFILE_DB -c CONTIGS_DB
                                 [-o DIR_PATH] [-O FILENAME_PREFIX]
                                 [--splits-mode] [--report-contigs]

Parameters

optional arguments:

  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -o DIR_PATH, --output-dir DIR_PATH
                        Directory path for output files
  -O FILENAME_PREFIX, --output-file-prefix FILENAME_PREFIX
                        A prefix to be used while naming the output files (no
                        file type extensions please; just a prefix).
  --splits-mode         Specify this flag if you would like to output
                        coverages of individual 'splits', rather than their
                        'parent' contig coverages.
  --report-contigs      By default this program reports sequences and their
                        coverages for 'splits'. By using this flag, you can
                        report contig sequences and coverages instead. For
                        obvious reasons, you can't use this flag with
                        `--splits-mode` flag.

anvi-export-splits-taxonomy

Export taxonomy for splits found in an anvi'o contigs database

Usage

anvi-export-splits-taxonomy [-h] -c CONTIGS_DB -o FILE_PATH

Parameters

optional arguments:

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.

anvi-export-state

Export an anvi'o state into a profile database.

Usage

anvi-export-state [-h] -p PAN_OR_PROFILE_DB [-o FILE_PATH]
                  [-s STATE_NAME] [--list-states]

Parameters

optional arguments:

  -p PAN_OR_PROFILE_DB, --pan-or-profile-db PAN_OR_PROFILE_DB
                        Anvi'o pan or profile database
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.
  -s STATE_NAME, --state STATE_NAME
                        The state name to export.
  --list-states         Show available states and exit.

anvi-export-table

Export anvi'o database tables as TAB-delimited text files.

Usage

anvi-export-table [-h] [--table TABLE_NAME] [-l] [-f FIELDS]
                  [-o FILE_PATH]
                  DB

Parameters

positional arguments:

  DB                    Anvi'o database to read from.

optional arguments:

  --table TABLE_NAME    Table name to export.
  -l, --list            Gives a list of tables in a database and quits. If a
                        table is already declared this time it lists all the
                        fields in a given table, in case you would to export
                        only a specific list of fields from the table using
                        --fields parameter.
  -f FIELD(S), --fields FIELD(S)
                        Fields to report. USe --list-tables parameter with a
                        table name to see available fields You can list fields
                        using this notation: --fields 'field_1, field_2, ...
                        field_N'.
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.

anvi-gen-contigs-database

Generate a new anvio contigs database.

Usage

anvi-gen-contigs-database [-h] -f FASTA [-n PROJECT_NAME]
                          [-o DB_FILE_PATH] [--description TEXT_FILE]
                          [-L INT] [-K INT] [--skip-gene-calling]
                          [--external-gene-calls GENE-CALLS]
                          [--ignore-internal-stop-codons]
                          [--skip-mindful-splitting]

Parameters

MANDATORY INPUTS: Things you really need to provide to be in business.

  -f FASTA, --contigs-fasta FASTA
                        The FASTA file that contains reference sequences you
                        mapped your samples against. This could be a reference
                        genome, or contigs from your assembler. Contig names
                        in this file must match to those in other input files.
                        If there is a problem anvi'o will gracefully complain
                        about it.
  -n PROJECT_NAME, --project-name PROJECT_NAME
                        Name of the project. Please choose a short but
                        descriptive name (so anvi'o can use it whenever she
                        needs to name an output file, or add a new table in a
                        database, or name her first born).

OPTIONAL INPUTS: Things you may want to tweak.

  -o DB_FILE_PATH, --output-db-path DB_FILE_PATH
                        Output file path for the new database.
  --description TEXT_FILE
                        A plain text file that contains some description about
                        the project. You can use Markdwon syntax. The
                        description text will be rendered and shown in all
                        relevant interfaces, including the anvi'o interactive
                        interface, or anvi'o summary outputs.
  -L INT, --split-length INT
                        Splitting very large contigs into multiple pieces
                        improves the efficacy of the visualization step. The
                        default value is (20000). If you are not sure, we
                        advise you to not go below 10,000. The lower you go,
                        the more complicated the tree will be, and will take
                        more time and computational resources to finish the
                        analysis. Also this is not a case of 'the smaller the
                        split size the more sensitive the results'. If you do
                        not want your contigs to be split, you can either
                        simply enter '0' or ANY OTHER negative integer (lots
                        of unnecessary freedom here, enjoy!).
  -K INT, --kmer-size INT
                        K-mer size for k-mer frequency calculations. The
                        default k-mer size for composition-based analyses is
                        4, historically. Although tetra-nucleotide frequencies
                        seem to offer the the sweet spot of sensitivity,
                        information density, and manageable number of
                        dimensions for clustering approaches, you are welcome
                        to experiment (but maybe you should leave it as is for
                        your first set of analyses).
  --skip-mindful-splitting
                        By default, anvi'o attempts to prevent soft-splitting
                        large contigs by cutting prper gene calles to make
                        sure a single gene is not broken into multiple splits.
                        This requires a careful examination of where genes
                        start and end, and to find best locations to split
                        contigs with respect to this informtion. So, when the
                        user asks for a split size of, say, 1,000, it serves
                        as a mere suggestion. When this flag is used, anvi'o
                        does what the user wants and creates splits at desired
                        lengths (although some functionality may become
                        unavailable for the projects that rely on a contigs
                        database that is initiated this way).

GENES IN CONTIGS: Expert thingies.

  --skip-gene-calling   By default, generating an anvi'o contigs database
                        includes the identification of open reading frames in
                        contigs by running a bacterial gene caller. Declaring
                        this flag will by-pass that process. If you prefer,
                        you can later import your own gene calling results
                        into the database.
  --external-gene-calls GENE-CALLS
                        A TAB-delimited file to utilize external gene calls.
                        The file must have these columns: 'gene_callers_id' (a
                        unique integer number for each gene call, start from
                        1), 'contig' (the contig name the gene call is found),
                        'start' (start position, integer), 'stop' (stop
                        position, integer), 'direction' (the direction of the
                        gene open reading frame; can be 'f' or 'r'), 'partial'
                        (whether it is a complete gene call, or a partial one;
                        must be 1 for partial calls, and 0 for complete
                        calls), 'source' (the gene caller), and 'version' (the
                        version of the gene caller, i.e., v2.6.7 or v1.0). An
                        example file can be found via the URL
                        https://goo.gl/TqCWT2
  --ignore-internal-stop-codons
                        This is only relevant when you have an external gene
                        calls file. If anvi'o figures out that your custom
                        gene calls result in amino acid seqeunces with stop
                        codons in the middle, it will complain about it. You
                        can use this flag to tell anvi'o to don't check for
                        internal stop codons, EVEN THOUGH IT MEANS THERE IS
                        MOST LIKELY SOMETHING WRONG WITH YOUR EXTERNAL GENE
                        CALLS FILE. Anvi'o will understand that sometimes we
                        don't want to care, and will not judge you. Instead,
                        it will replace every stop codon residue in the amino
                        acid sequence with an 'X' character. Please let us
                        know if you used this and things failed, so we can
                        tell you that you shouldn't have really used it if you
                        didn't like failures at the first place (smiley).

anvi-gen-gene-consensus-sequences

Collapse variability for a set of genes across samples

Usage

anvi-gen-gene-consensus-sequences [-h] -p PROFILE_DB -c CONTIGS_DB
                                  [--gene-caller-id GENE_CALLER_ID]
                                  [--genes-of-interest FILE]
                                  [--samples-of-interest FILE]
                                  [-o FILE_PATH] [--tab-delimited]
                                  [--engine ENGINE]

Parameters

DATABASES: Declaring relevant anvi'o databases. First things first.

  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'

FOCUS: What do we want? A consensus sequence for a gene, or a list of genes. From where do we want it? All samples, by default. When do we want it? Whenever it is convenient.

  --gene-caller-id GENE_CALLER_ID
                        A single gene id.
  --genes-of-interest FILE
                        A file with anvi'o gene caller IDs. There should be
                        only one column in the file, and each line should
                        correspond to a unique gene caller id (without a
                        column header).
  --samples-of-interest FILE
                        A file with samples names. There should be only one
                        column in the file, and each line should correspond to
                        a unique sample name (without a column header).

OUTPUT: Output file and output style

  -o FILE_PATH, --output-file FILE_PATH
                        The output file name. The boring default is
                        "genes.fa". You can change the output file format to a
                        TAB-delimited file using teh flag `--tab-delimited`,
                        in which case please do not forget to change the file
                        name, too.
  --tab-delimited       Use the TAB-delimited format for the output file.

EXTRAS: Parameters that will help you to do a very precise analysis. If you declare nothing from this bunch, you will get "everything" to play with, which is not necessarily a good thing…

  --engine ENGINE       Varaibility engine. The default is 'NT'.

anvi-gen-genomes-storage

Create a genome storage from internal or external genomes for a pan genome analysis.

Usage

anvi-gen-genomes-storage [-h] [-e FILE_PATH] [-i FILE_PATH]
                         [--gene-caller GENE-CALLER] -o FILE_PATH

Parameters

EXTERNAL GENOMES: External genomes listed as anvi'o contigs databases. As in, you have one or more genomes say from NCBI you want to work with, and you created an anvi'o contigs database for each one of them.

  -e FILE_PATH, --external-genomes FILE_PATH
                        A two-column TAB-delimited flat text file that lists
                        anvi'o contigs databases. The first item in the header
                        line should read 'name', and the second should read
                        'contigs_db_path'. Each line in the file should
                        describe a single entry, where the first column is the
                        name of the genome (or MAG), and the second column is
                        the anvi'o contigs database generated for this genome.

INTERNAL GENOMES: Genome bins stored in an anvi'o profile databases as collections.

  -i FILE_PATH, --internal-genomes FILE_PATH
                        A four-column TAB-delimited flat text file. The header
                        line must contain thse columns: 'name', 'bin_id',
                        'collection_id', 'profile_db_path', 'contigs_db_path'.
                        Each line should list a single entry, where 'name' can
                        be any name to describe the anvi'o bin identified as
                        'bin_id' that is stored in a collection.

PRO STUFF: Things you may not have to change. But you never know (unless you read the help).

  --gene-caller GENE-CALLER
                        The gene caller to utilize. Anvi'o supports multiple
                        gene callers, and some operations (including this one)
                        requires an explicit mentioning of which one to use.
                        The default is 'prodigal', but it will not be enough
                        if you if you were a rebel adn have used `--external-
                        gene-callers` or something.

OUTPUT: Give it a nice name. Must end with '-GENOMES.db'. This is primarily due to the fact that there are other .db files used throughout anvi'o and it would be better to distinguish this very special file from them.

  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.

anvi-gen-network

Generate a Gephi network for functions based on non-normalized gene coverage values

Usage

anvi-gen-network [-h] -p PROFILE_DB -c CONTIGS_DB
                 [--annotation-source SOURCE NAME] [-l]

Parameters

optional arguments:

  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  --annotation-source SOURCE NAME
                        Get functional annotations for a specific annotation
                        source. You can use the flag '--list-annotation-
                        sources' to learn about what sources are available.
  -l, --list-annotation-sources
                        List available sources for annotation in the contigs
                        database and quit.

anvi-gen-phylogenomic-tree

Generate phylogenomic tree from aligment file.

Usage

anvi-gen-phylogenomic-tree [-h] -f FASTA -o FILE_PATH
                           [--program PROGRAM_NAME]

Parameters

INPUT FILES: Concatenated aligment files exported using anvi-export-pc-aligments

  -f FASTA, --fasta-file FASTA
                        A FASTA-formatted input file

OUTPUT FILE: The output file where the generated newick tree will be stored.

  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.

PROGRAM: The program that will be used for generating tree. Available options: default, fasttree

  --program PROGRAM_NAME
                        Program name.

anvi-gen-variability-matrix

Generate Variability Matrix

Usage

anvi-gen-variability-matrix [-h] -c CONTIGS_DB --splits-of-interest
                            FILE [--samples-of-interest FILE]
                            [--num-positions-from-each-split INT]
                            [-m INT] [-r RATIO] [-o FILE_PATH]
                            SUMMARY_DICT

Parameters

positional arguments:

  SUMMARY_DICT          Summary file

optional arguments:

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  --splits-of-interest FILE
                        A file with split names. There should be only one
                        column in the file, and each line should correspond to
                        a unique split name.
  --samples-of-interest FILE
                        A file with samples names. There should be only one
                        column in the file, and each line should correspond to
                        a unique sample name (without a column header).
  --num-positions-from-each-split INT
                        Each split may have one or more variable positions. By
                        default, anvi'o will report every SNV position found
                        in a given split. This parameter will help you to
                        define a cutoff for the maximum number of SNVs to be
                        reported from a split (if the number of SNVs is more
                        than the number you declare using this parameter, the
                        positions will be randomly subsampled).
  -m INT, --min-scatter INT
                        This one is tricky. If you have N samples in your
                        dataset, a given variable position x in one of your
                        splits can split your N samples into `t` groups based
                        on the identity of the variation they harbor at
                        position x. For instance, `t` would have been 1, if
                        all samples had the same type of variation at position
                        x (which would not be very interesting, because in
                        this case position x would have zero contribution to a
                        deeper understanding of how these samples differ based
                        on variability. When `t` > 1, it would mean that
                        identities at position x across samples do differ. But
                        how much scattering occurs based on position x when t
                        > 1? If t=2, how many samples ended in each group?
                        Obviously, even distribution of samples across groups
                        may tell us something different than uneven
                        distribution of samples across groups. So, this
                        parameter filters out any x if 'the number of samples
                        in the second largest group' (=scatter) is less than
                        -m. Here is an example: lets assume you have 7
                        samples. While 5 of those have AG, 2 of them have TC
                        at position x. This would mean scatter of x is 2. If
                        you set -m to 2, this position would not be reported
                        in your output matrix. The default value for -m is 0,
                        which means every `x` found in the database and
                        survived previous filtering criteria will be reported.
                        Naturally, -m can not be more than half of the number
                        of samples. Please refer to the user documentation if
                        this is confusing.
  -r RATIO, --min-ratio-of-competings-nts RATIO
                        Minimum ratio of the competing nucleotides at a given
                        position. Default is 0.
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.

anvi-gen-variability-network

A program to generate a network description from an anvi'o variability profile.

Usage

anvi-gen-variability-network [-h] -i VARIABILITY_PROFILE
                             [-n NUM_POSITIONS] [-o FILE_PATH]

Parameters

optional arguments:

  -i VARIABILITY_PROFILE, --input-file VARIABILITY_PROFILE
                        The anvi'o variability profile. Please see `anvi-gen-
                        variability-profile` to generate one.
  -n NUM_POSITIONS, --max-num-unique-positions NUM_POSITIONS
                        Maximum number of unique positions to be used in the
                        network. This may be one way to avoid extremely large
                        network descriptions that would defeat the purpose of
                        a quick visualization. If there are more unique
                        positions in the variability profile, the program will
                        randomly select a subset of them to match the `max-
                        num-unique-positions`. The default is 0, which means
                        all positions should be reported. Remember that the
                        number of nodes in the network will also depend on the
                        number of samples described in the variability
                        profile.
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.

anvi-gen-variability-profile

Extract information for variable positions

Usage

anvi-gen-variability-profile [-h] [--splits-of-interest FILE]
                             [-C COLLECTION_NAME] [-b BIN_NAME] -p
                             PROFILE_DB -c CONTIGS_DB [-o FILE_PATH]
                             [--samples-of-interest FILE]
                             [--quince-mode] [--include-contig-names]
                             [--include-split-names] [--engine ENGINE]
                             [--num-positions-from-each-split INT]
                             [-m INT]
                             [--min-coverage-in-each-sample INT]
                             [-r FLOAT] [-z FLOAT] [-j FLOAT]
                             [-a FLOAT] [-x NUM_SAMPLES]
                             [--genes-of-interest FILE]

Parameters

DATABASES: Declaring relevant anvi'o databases. First things first.

  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'

SPLITS: Declaring relevant splits for the analysis. There are two ways to do it. One, you can give a file path with split names, or, as an alternative, you can provide a collection id with a bin name.

  --splits-of-interest FILE
                        A file with split names. There should be only one
                        column in the file, and each line should correspond to
                        a unique split name.
  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        Collection name.
  -b BIN_NAME, --bin-id BIN_NAME
                        Bin name you are interested in.

OUTPUT: Output file and output style

  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.
  --samples-of-interest FILE
                        A file with samples names. There should be only one
                        column in the file, and each line should correspond to
                        a unique sample name (without a column header).
  --quince-mode         The default behavior is to report base frequencies of
                        nucleotide positions only if there is any variation
                        reported during profiling (which by default uses some
                        heuristics to minimize the impact of error-driven
                        variation). So, if there are 10 samples, and a given
                        position has been reported as a varaible site during
                        profiling in only one of those samples, there will be
                        no information will be stored in the database for the
                        remaining 9. When this flag is used, we go back to
                        each sample, and report base frequencies for each
                        sample at this position even if they do not vary. It
                        will take considerably longer to report when this flag
                        is on, and the use of it will increase the file size
                        dramatically, however it is inevitable for some
                        statistical approaches (as well as for some beautiful
                        visualizations).
  --include-contig-names
                        Use this flag if you would like contig names for each
                        variable position to be included in the output file as
                        a column. By default, we do not include contig names
                        since they can practically double the output file size
                        without any actual benefit in most cases.
  --include-split-names
                        Use this flag if you would like split names for each
                        variable position to be included in the output file as
                        a column.

EXTRAS: Parameters that will help you to do a very precise analysis. If you declare nothing from this bunch, you will get "everything" to play with, which is not necessarily a good thing…

  --engine ENGINE       Varaibility engine. The default is 'NT'.
  --num-positions-from-each-split INT
                        Each split may have one or more variable positions. By
                        default, anvi'o will report every SNV position found
                        in a given split. This parameter will help you to
                        define a cutoff for the maximum number of SNVs to be
                        reported from a split (if the number of SNVs is more
                        than the number you declare using this parameter, the
                        positions will be randomly subsampled).
  -m INT, --min-scatter INT
                        This one is tricky. If you have N samples in your
                        dataset, a given variable position x in one of your
                        splits can split your N samples into `t` groups based
                        on the identity of the variation they harbor at
                        position x. For instance, `t` would have been 1, if
                        all samples had the same type of variation at position
                        x (which would not be very interesting, because in
                        this case position x would have zero contribution to a
                        deeper understanding of how these samples differ based
                        on variability. When `t` > 1, it would mean that
                        identities at position x across samples do differ. But
                        how much scattering occurs based on position x when t
                        > 1? If t=2, how many samples ended in each group?
                        Obviously, even distribution of samples across groups
                        may tell us something different than uneven
                        distribution of samples across groups. So, this
                        parameter filters out any x if 'the number of samples
                        in the second largest group' (=scatter) is less than
                        -m. Here is an example: lets assume you have 7
                        samples. While 5 of those have AG, 2 of them have TC
                        at position x. This would mean scatter of x is 2. If
                        you set -m to 2, this position would not be reported
                        in your output matrix. The default value for -m is 0,
                        which means every `x` found in the database and
                        survived previous filtering criteria will be reported.
                        Naturally, -m can not be more than half of the number
                        of samples. Please refer to the user documentation if
                        this is confusing.
  --min-coverage-in-each-sample INT
                        Minimum coverage of a given variable nucleotide
                        position in all samples. If a nucleotide position is
                        covered less than this value even in one sample, it
                        will be removed from the analysis. Default is 0.
  -r FLOAT, --min-departure-from-reference FLOAT
                        Takes a value between 0 and 1, where 1 is maximum
                        divergence from the reference. Default is 0.000000.
                        The reference here observation that corresponds to a
                        given position in the mapped context.
  -z FLOAT, --max-departure-from-reference FLOAT
                        Similar to '--min-departure-from-reference', but
                        defines an upper limit for divergence. The default is
                        1.000000.
  -j FLOAT, --min-departure-from-consensus FLOAT
                        Takes a value between 0 and 1, where 1 is maximum
                        divergence from the consensus for a given position.
                        The default is 0.000000. The consensus is the most
                        frequent observation at a given positon.
  -a FLOAT, --max-departure-from-consensus FLOAT
                        Similar to '--min-departure-from-consensus', but
                        defines an upper limit for divergence. The default is
                        1.000000.
  -x NUM_SAMPLES, --min-occurrence NUM_SAMPLES
                        Minimum number of samples a nucleotide position should
                        be reported as variable. Default is 1. If you set it
                        to 2, for instance, each eligable variable position
                        will be expected to appear in at least two samples,
                        which will reduce the impact of stochastic, or
                        unintelligeable varaible positions.
  --genes-of-interest FILE
                        A file with anvi'o gene caller IDs. There should be
                        only one column in the file, and each line should
                        correspond to a unique gene caller id (without a
                        column header).

anvi-get-aa-counts

Collects AA counts information from a contigs database for a given bin, set of contigs, or set of genes.

Usage

anvi-get-aa-counts [-h] -c CONTIGS_DB [-o FILE_PATH] [-p PROFILE_DB]
                   [-C COLLECTION_NAME] [-B FILE_PATH]
                   [--contigs-of-interest FILE]
                   [--gene-caller-ids GENE_CALLER_IDS]

Parameters

MANDATORY STUFF: You have to set the following two parameters, then you will select one set of parameters from the following optional sections. If you select nothing from those sets, AA counts for everything in the contigs database will be reported.

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.

OPTIONAL PARAMS FOR BINS:

  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        Collection name.
  -B FILE_PATH, --bin-ids-file FILE_PATH
                        Text file for bins (each line should be a unique bin
                        id).

OPTIONAL PARAMS FOR CONTIGS:

  --contigs-of-interest FILE
                        A file with contig names. There should be only one
                        column in the file, and each line should correspond to
                        a unique split name.

OPTIONAL PARAMS FOR GENE CALLS:

  --gene-caller-ids GENE_CALLER_IDS
                        Gene caller ids. Multiple of them can be declared
                        separated by a delimiter (the default is a comma). If
                        you declare nothing, you may get everything. Or you
                        may get an error. Really depends on the situation.
                        Worth a try.

anvi-get-aa-frequencies

Frequencies of AA linkmers

Usage

anvi-get-aa-frequencies [-h] -i INPUT_BAM -c CONTIGS_DB
                        --gene-caller-id GENE_CALLER_ID
                        [--return-codon-frequencies-instead] -o
                        FILE_PATH

Parameters

optional arguments:

  -i INPUT_BAM, --input-file INPUT_BAM
                        Sorted and indexed BAM file to analyze.
  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  --gene-caller-id GENE_CALLER_ID
                        A single gene id.
  --return-codon-frequencies-instead
                        By default, anvi'o will return amino acid frequencies
                        here, however, you can ask for codon frequencies
                        instead, simply because you always need more data and
                        more stuff. You're lucky this time, but is there an
                        end to this? Will you ever be satisfied with what you
                        have? Anvi'o needs answers.
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.

anvi-get-aa-sequences-for-gene-calls

Get amino acid sequences from a contigs database for all gene calls.

Usage

anvi-get-aa-sequences-for-gene-calls [-h] -c CONTIGS_DB [-o FILE_PATH]
                                     [--gene-caller-ids GENE_CALLER_IDS]
                                     [--delimiter CHAR]

Parameters

optional arguments:

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.
  --gene-caller-ids GENE_CALLER_IDS
                        Gene caller ids. Multiple of them can be declared
                        separated by a delimiter (the default is a comma). If
                        you declare nothing, you may get everything. Or you
                        may get an error. Really depends on the situation.
                        Worth a try.
  --delimiter CHAR      The delimiter to parse multiple input terms. The
                        default is ','.

anvi-get-dna-sequences-for-gene-calls

A script to get back sequences of a list of genes

Usage

anvi-get-dna-sequences-for-gene-calls [-h] -c CONTIGS_DB
                                      [--gene-caller-ids GENE_CALLER_IDS]
                                      -o FILE_PATH [--delimiter CHAR]
                                      [--report-extended-deflines]
                                      [--wrap WRAP] [--export-gff3]

Parameters

optional arguments:

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  --gene-caller-ids GENE_CALLER_IDS
                        Gene caller ids. Multiple of them can be declared
                        separated by a delimiter (the default is a comma). If
                        you declare nothing, you may get everything. Or you
                        may get an error. Really depends on the situation.
                        Worth a try.
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.
  --delimiter CHAR      The delimiter to parse multiple input terms. The
                        default is ','.
  --report-extended-deflines
                        When declared, the deflines in the resulting FASTA
                        file will contain more information.
  --wrap WRAP           When to wrap sequences when storing them in a FASTA
                        file. The default is '120'. A value of '0' would be
                        equivalent to 'do not wrap'.
  --export-gff3         If this is true, the output file will be in GFF3
                        format.

anvi-get-sequences-for-gene-clusters

Do cool stuff with gene clusters in anvi'o pan genomes

Usage

anvi-get-sequences-for-gene-clusters [-h] -p PAN_DB
                                     [-g GENOMES_STORAGE]
                                     [-o FILE_PATH]
                                     [--report-DNA-sequences]
                                     [--gene-cluster-id GENE_CLUSTER_ID]
                                     [--gene-cluster-ids-file FILE_PATH]
                                     [-C COLLECTION_NAME] [-b BIN_NAME]
                                     [--min-num-genomes-gene-cluster-occurs INTEGER]
                                     [--max-num-genomes-gene-cluster-occurs INTEGER]
                                     [--min-num-genes-from-each-genome INTEGER]
                                     [--max-num-genes-from-each-genome INTEGER]
                                     [--max-num-gene-clusters-missing-from-genome INTEGER]
                                     [--add-into-items-additional-data-table NAME]
                                     [--list-collections] [--list-bins]
                                     [--concatenate-gene-clusters]
                                     [--align-with ALIGNER]
                                     [--list-aligners] [--just-do-it]

Parameters

INPUT FILES: Input files from the pangenome analysis.

  -p PAN_DB, --pan-db PAN_DB
                        Anvi'o pan database
  -g GENOMES_STORAGE, --genomes-storage GENOMES_STORAGE
                        Anvi'o genomes storage file

OUTPUT: You get to chose an output file name to report things. The default will be an ugly name. So, be explicit.

  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.
  --report-DNA-sequences
                        By default, this program reports amino acid sequences.
                        You can change that behavior and as for DNA sequences
                        instead using this flag.

SELECTION: Which gene clusters should be reported. You can ask for a single gene cluster, or multiple ones listed in a file, or you can use a collection and bin name to list gene clusters of interest. If you give nothing, this program will export alignments for every single gene cluster found in the profile database (and this is called 'customer service').

  --gene-cluster-id GENE_CLUSTER_ID
                        Gene cluster ID you are interested in.
  --gene-cluster-ids-file FILE_PATH
                        Text file for gene clusters (each line should contain
                        be a unique gene cluster id).
  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        Collection name.
  -b BIN_NAME, --bin-id BIN_NAME
                        Bin name you are interested in.

ADVANCED FILTERS: If you are here you must be looking for ways to specify exactly what you want from that database of gene clusters. These filters will be applied to what your previous selections reported.

  --min-num-genomes-gene-cluster-occurs INTEGER
                        This filter will remove gene clusters from your
                        report. Let's assume you have 100 genomes in your pan
                        genome analysis. You can use this parameter if you
                        want to work only with gene clusters that occur in at
                        least X number of genomes. If you say '--min-num-
                        genomes-gene-cluster-occurs 90', each gene cluster in
                        the analysis will be required at least to appear in 90
                        genomes. If a gene occurs in less than that number of
                        genomes, it simply will not be reported. This is
                        especially useful for phylogenomic analyses, where you
                        may want to only focus on gene clusters that are
                        prevalent across the set of genomes you wish to
                        analyze.
  --max-num-genomes-gene-cluster-occurs INTEGER
                        This filter will remove gene clusters from your
                        report. Let's assume you have 100 genomes in your pan
                        genome analysis. You can use this parameter if you
                        want to work only with gene clusters that occur in at
                        most X number of genomes. If you say '--min-num-
                        genomes-gene-cluster-occurs 1', you will get gene
                        clusters that are singletons. Combining this paramter
                        with --min-num-genomes-gene-cluster-occurs can give
                        you a very precise way to filter your gene clusters.
  --min-num-genes-from-each-genome INTEGER
                        This filter will remove gene clusters from your
                        report. If you say '--min-num-genes-from-each-genome
                        2', this filter will remove every gene cluster, to
                        which every genome in your analysis contributed less
                        than 2 genes. This can be useful to find out gene
                        clusters with many genes from many genomes (such as
                        conserved multi-copy genes within a clade).
  --max-num-genes-from-each-genome INTEGER
                        This filter will remove gene clusters from your
                        report. If you say '--max-num-genes-from-each-genome
                        1', every gene cluster that has more than one gene
                        from any genome that contributes to it will be removed
                        from your analysis. This could be useful to remove
                        gene clusters with paralogs from your report for
                        appropriate phylogenomic analyses. For instance, using
                        '--max-num-genes-from-each-genome 1' and 'min-num-
                        genomes-gene-cluster-occurs X' where X is the total
                        number of your genomes, would give you the single-copy
                        gene cluters in your pan genome.
  --max-num-gene-clusters-missing-from-genome INTEGER
                        This filter will remove genomes from your report. If
                        you have a list of gene cluster names, you can use
                        this parameter to omit any genome from your report if
                        it is missing more than a number of genes you desire.
                        For instance, if you have 100 genomes in your pan
                        genome, and you are interested in working only with
                        genomes that have all 5 specific gene clusters of your
                        choice, you can use '--max-num-gene-clusters-missing-
                        from-genome 4' to remove remove the bins that are
                        missing more than 4 of those 5 genes. This is
                        especially useful for phylogenomic analyses. Parameter
                        0 will remove any genome that is missing any of the
                        genes.
  --add-into-items-additional-data-table NAME
                        If you use any of the filters, and would like to add
                        the resulting item names into the items additional
                        data table of your database, you can use this
                        parameter. You will need to give a name for these
                        results to be saved. If the given name is already in
                        the items additoinal data table, its contents will be
                        replaced with the new one. Then you can run anvi-
                        interactive or anvi-display-pan to 'see' the results
                        of your filters.

OTHER STUFF: Yes. Stuff that are not like the ones above.

  --list-collections    Show available collections and exit.
  --list-bins           List available bins in a collection and exit.

CONCATENATED OUTPUT: Concatenated output for phylogenomics.

  --concatenate-gene-clusters
                        Concatenate output gene clusters in the same order to
                        create a multi-gene alignment output that is suitable
                        for phylogenomic analyses.
  --align-with ALIGNER  The multiple sequnce alignment program to use when
                        multiple seqeunce alignment is necessary. To see all
                        available optons, use the flag `--list-aligners`.
  --list-aligners       Show available software for multiple sequence
                        alignment.

TOTALLY IRRELEVANT: Just in case you need it.

  --just-do-it          Don't bother me with questions or warnings, just do
                        it.

anvi-get-sequences-for-hmm-hits

Get sequences for HMM hits from many inputs.

Usage

anvi-get-sequences-for-hmm-hits [-h] [-c CONTIGS_DB] [-p PROFILE_DB]
                                [-C COLLECTION_NAME] [-b BIN_NAME]
                                [-B FILE_PATH] [-e FILE_PATH]
                                [--hmm-sources SOURCE NAME]
                                [--gene-names HMM HIT NAME] [-l] [-L]
                                [-o FILE_PATH] [--get-aa-sequences]
                                [--concatenate-genes]
                                [--max-num-genes-missing-from-bin INTEGER]
                                [--min-num-bins-gene-occurs INTEGER]
                                [--align-with ALIGNER]
                                [--separator STRING]
                                [--return-best-hit]

Parameters

INPUT OPTION #1: CONTIGS DB: There are multiple ways to access to sequences. Your first option is to provide a contigs database, and call it a day. In this case the program will return you everything from it.

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'

INPUT OPTION #2: CONTIGS DB + PROFLIE DB: You can also work with anvi'o profile databases and collections stored in them. If you go this way, you still will need to provide a contigs database. If you just specify a collection name, you will get hits from every bin in it. You can also use the bin name or bin ids file parameters to specify your interest more precisely.

  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        Collection name.
  -b BIN_NAME, --bin-id BIN_NAME
                        Bin name you are interested in.
  -B FILE_PATH, --bin-ids-file FILE_PATH
                        Text file for bins (each line should be a unique bin
                        id).

INPUT OPTION #3: EXTERNAL GENOMES FILE: If you have multiple contigs databases without any profile database, you can start with this one. In this case you are not supposed to provide a profile database or an individual contigs database. This is for people who want to use this just with a bunch of FASTA files with their genomes.

  -e FILE_PATH, --external-genomes FILE_PATH
                        A two-column TAB-delimited flat text file that lists
                        anvi'o contigs databases. The first item in the header
                        line should read 'name', and the second should read
                        'contigs_db_path'. Each line in the file should
                        describe a single entry, where the first column is the
                        name of the genome (or MAG), and the second column is
                        the anvi'o contigs database generated for this genome.

HMM STUFF: This is where you can specify an HMM source, and/or a list of genes to filter your results.

  --hmm-sources SOURCE NAME
                        Get sequences for a specific list of HMM sources. You
                        can list one or more sources by separating them from
                        each other with a comma character (i.e., '--hmm-
                        sources source_1,source_2,source_3'). If you would
                        like to see a list of available sources in the contigs
                        database, run this program with '--list-hmm-sources'
                        flag.
  --gene-names HMM HIT NAME
                        Get sequences only for a specific gene name. Each name
                        should be separated from each other by a comma
                        character. For instance, if you want to get back only
                        RecA and Ribosomal_L27, you can type '--gene-names
                        RecA,Ribosomal_L27', and you will get any and every
                        hit that matches these names in any source. If you
                        would like to see a list of available gene names, you
                        can use '--list-available-gene-names' flag.
  -l, --list-hmm-sources
                        List available HMM sources in the contigs database and
                        quit.
  -L, --list-available-gene-names
                        List available gene names in HMM sources selection and
                        quit.

THE OUTPUT: Where should the output go. It will be a FASTA file, and you better give it a nice name..

  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.

THE ALPHABET: The sequences are reported in DNA alphabet, but you can also get them translated just like all the other cool kids.

  --get-aa-sequences    Store amino acid sequences instead.

PHYLOGENOMICS? K!: If you want, you can get your sequences concatanated. In this case anwi'o will use muscle to align every homolog, and concatenate them the order you specified using the gene-names argument. Each concatenated sequence will be separated from the other ones by the separator.

  --concatenate-genes   Concatenate output genes in the same order to create a
                        multi-gene alignment output that is suitable for
                        phylogenomic analyses.
  --max-num-genes-missing-from-bin INTEGER
                        This filter removes bins (or genomes) from your
                        analysis. If you have a list of gene names, you can
                        use this parameter to omit any bin (or external
                        genome) that is missing more than a number of genes
                        you desire. For instance, if you have 100 genome bins,
                        and you are interested in working with 5 ribosomal
                        proteins, you can use '--max-num-genes-missing-from-
                        bin 4' to remove remove the bins that are missing more
                        than 4 of those 5 genes. This is especially useful for
                        phylogenomic analyses. Parameter 0 will remove any bin
                        that is missing any of the genes.
  --min-num-bins-gene-occurs INTEGER
                        This filter removes genes from your analysis. Let's
                        assume you have 100 bins to get sequences for HMM
                        hits. If you want to work only with genes among all
                        the hits that occur in at least X number of bins, and
                        discard the rest of them, you can use this flag. If
                        you say '--min-num-bins-gene-occurs 90', each gene in
                        the analysis will be required at least to appear in 90
                        genomes. If a gene occurs in less than that number of
                        genomes, it simply will not be reported. This is
                        especially useful for phylogenomic analyses, where you
                        may want to only focus on genes that are prevalent
                        across the set of genomes you wish to analyze.
  --align-with ALIGNER  The multiple sequnce alignment program to use when
                        multiple seqeunce alignment is necessary. To see all
                        available optons, use the flag `--list-aligners`.
  --separator STRING    A word that will be used to sepaate concatenated gene
                        sequences from each other (IF you are using this
                        program with `--concatenate-genes` flag). The default
                        is "XXX" for amino acid sequences, and "NNN" for DNA
                        sequences

OPTIONAL: Everything is optional, but some options are more optional than others.

  --return-best-hit     A bin may contain more than one hit for a gene name in
                        a given HMM source. For instance, there may be
                        multiple RecA hits in a genome bin from Campbell et
                        al.. Using this flag, will go through all of the gene
                        names that appear multiple times, and remove all but
                        the one with the lowest e-value. Good for whenever you
                        really need to get only a single copy of single-copy
                        core genes from a genome bin.

anvi-get-short-reads-from-bam

Get short reads back from a BAM file.

Usage

anvi-get-short-reads-from-bam [-h] -p PROFILE_DB -c CONTIGS_DB
                              [-C COLLECTION_NAME] [-b BIN_NAME]
                              [-B FILE_PATH] [-o FILE_PATH]
                              BAM FILE[S] [BAM FILE[S] ...]

Parameters

positional arguments:

  BAM FILE[S]           BAM file(s) to access to recover short reads

optional arguments:

  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        Collection name.
  -b BIN_NAME, --bin-id BIN_NAME
                        Bin name you are interested in.
  -B FILE_PATH, --bin-ids-file FILE_PATH
                        Text file for bins (each line should be a unique bin
                        id).
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.

anvi-get-short-reads-mapping-to-a-gene

Access reads in contigs and positions in a BAM file

Usage

anvi-get-short-reads-mapping-to-a-gene [-h] -i INPUT_BAMS)
                                       [INPUT_BAM(S ...] -c CONTIGS_DB
                                       --gene-caller-id GENE_CALLER_ID
                                       -o FILE_PATH
                                       [--leeway LEEWAY_NTs]

Parameters

optional arguments:

  -i INPUT_BAM(S) [INPUT_BAM(S) ...], --input-files INPUT_BAM(S) [INPUT_BAM(S) ...]
                        Sorted and indexed BAM files to analyze. It is
                        essential that all BAM files must be the result of
                        mappings against the same contigs.
  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  --gene-caller-id GENE_CALLER_ID
                        A single gene id.
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.
  --leeway LEEWAY_NTs   The minimum number of nucleotides for a given short
                        read mapping into the gene context for it to be
                        reported. You must consider the length of your short
                        reads, as well as the length of the gene you are
                        targeting. The default is 100 nts.

anvi-get-split-coverages

Export splits and the coverage table from database

Usage

anvi-get-split-coverages [-h] -p PROFILE_DB [--split-name SPLIT_NAME]
                         [-c CONTIGS_DB] [-C COLLECTION_NAME]
                         [-b BIN_NAME] [-o FILE_PATH] [--list-splits]
                         [--list-collections] [--list-bins]

Parameters

ESSENTIAL ANVI'O DB: You need to provide a profile database.

  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database

INPUT OPTION #1: SPLIT NAME: You want nothing but the coverage values in a single split. FINE.

  --split-name SPLIT_NAME
                        Split name.

INPUT OPTION #2: COLLECTION + BIN: You want nucletide-level coverage values for all splits in a bin. FANCY.

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        Collection name.
  -b BIN_NAME, --bin-id BIN_NAME
                        Bin name you are interested in.

BORING STUFF: The output file and all.

  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.
  --list-splits         When declared, the program will list split names in
                        the profile database and quite
  --list-collections    Show available collections and exit.
  --list-bins           List available bins in a collection and exit.

anvi-import-collection

Import an external binning result into anvi'o

Usage

anvi-import-collection [-h] [-c CONTIGS_DB] [-p PAN_OR_PROFILE_DB] -C
                       COLLECTION_NAME [--bins-info BINS_INFO]
                       [--contigs-mode]
                       TAB DELIMITED FILE

Parameters

positional arguments:

  TAB DELIMITED FILE    The input file that describes bin IDs for each split
                        or contig.

optional arguments:

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -p PAN_OR_PROFILE_DB, --pan-or-profile-db PAN_OR_PROFILE_DB
                        Anvi'o pan or profile database
  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        Collection name.
  --bins-info BINS_INFO
                        Additional information for bins. The file must contain
                        three TAB-delimited columns, where the first one must
                        be a unique bin name, the second should be a 'source',
                        and the last one should be a 7 character HTML color
                        code (i.e., '#424242'). Source column must contain
                        information about the origin of the bin. If these bins
                        are automatically identified by a program like
                        CONCOCT, this column could contain the program name
                        and version. The source information will be associated
                        with the bin in various interfaces so in a sense it is
                        not *that* critical what it says there, but on the
                        other hand it is, becuse we should also think about
                        people who may end up having to work with what we put
                        together later.
  --contigs-mode        Use this flag if your binning was done on contigs
                        instead of splits. Please refer to the documentation
                        for help.

anvi-import-functions

Parse and store functional annotation of genes.

Usage

anvi-import-functions [-h] -c CONTIGS_DB [-p PARSER] -i FILES)
                      [FILE(S ...] [--drop-previous-annotations]

Parameters

optional arguments:

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -p PARSER, --parser PARSER
                        Parser to make sense of the input files (if you need
                        one). There are currently 1 parsers readily available:
                        ['interproscan']. IT IS OK if you do not select a
                        parser if you have a standard, TAB-delimited input
                        file for funcitonal annotation of genes. If this is
                        not like 2018 and everything is already outdated, you
                        should be able to go to this address and learn
                        everything you need like a boss:
                        http://merenlab.org/2016/06/18/importing-functions/
  -i FILE(S) [FILE(S) ...], --input-files FILE(S) [FILE(S) ...]
                        One or more input files should follow this parameter.
                        The way these files will be handled will depend on
                        which parser you selected (if you did select any).
  --drop-previous-annotations
                        Use this flag if you want anvi'o to remove ALL
                        previous functional annotations for your genes, and
                        then import the new data. The default behavior will
                        add any annotation source into the db incrementally
                        unless there are already annotations from this source.
                        In which case, it will first remove previous
                        annotations for that source only (i.e., if source X is
                        both in the db and in the incoming annotations data,
                        it will replace the content of source X in the db).

anvi-import-misc-data

Populate additional data or order tables in pan or profile databases for items or layers (the swiss army knife level stuff).

Usage

anvi-import-misc-data [-h] -p PAN_OR_PROFILE_DB -t NAME [--just-do-it]
                      TAB DELIMITED FILE

Parameters

positional arguments:

  TAB DELIMITED FILE    The input file that describes an additional data for
                        layers or items. The expected format of this file
                        depends on the data table you will target. This can
                        feel complicated, but we promise it is not (you
                        probably have a PhD or working on one, so trust us
                        when we say "it is not complicated"). You need to read
                        the online documentation if this is your first time
                        with this.

optional arguments:

  -p PAN_OR_PROFILE_DB, --pan-or-profile-db PAN_OR_PROFILE_DB
                        Anvi'o pan or profile database
  -t NAME, --target-data-table NAME
                        The target table is the table you are interested in
                        accessing. Currently it can be 'items','layers', or
                        'layer_orders'. Please see most up-to-date online
                        documentation for more information.
  --just-do-it          Don't bother me with questions or warnings, just do
                        it.

anvi-import-state

Import an anvi'o state into a profile database.

Usage

anvi-import-state [-h] -p PAN_OR_PROFILE_DB -s STATE_FILE -n STATE_NAME

Parameters

optional arguments:

  -p PAN_OR_PROFILE_DB, --pan-or-profile-db PAN_OR_PROFILE_DB
                        Anvi'o pan or profile database
  -s STATE_FILE, --state STATE_FILE
                        JSON serializable anvi'o state file.
  -n STATE_NAME, --name STATE_NAME
                        State name.

anvi-import-taxonomy

Import taxonomy information into an anvi'o contigs database.

Usage

anvi-import-taxonomy [-h] -c CONTIGS_DB [-p PARSER] -i FILES)
                     [FILE(S ...]

Parameters

optional arguments:

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -p PARSER, --parser PARSER
                        Parser to make sense of the input files. There are 2
                        parsers readily available: ['default_matrix',
                        'centrifuge']. It is OK if you do not select a parser,
                        but in that case there will be no additional contigs
                        available except the identification of single-copy
                        genes in your contigs for later use. Using a parser
                        will not prevent the analysis of single-copy genes,
                        but make anvio more powerful to help you make sense of
                        your results. Please see the documentation, or get in
                        touch with the developers if you have any questions
                        regarding parsers.
  -i FILE(S) [FILE(S) ...], --input-files FILE(S) [FILE(S) ...]
                        Input file(s) for selected parser. Each parser (except
                        "blank") requires input files to process that you
                        generate before running anvio. Please see the
                        documentation for details.

anvi-init-bam

Sort/Index BAM files

Usage

anvi-init-bam [-h] [-o FILE_PATH] BAM_FILE

Parameters

positional arguments:

  BAM_FILE              BAM file to analyze

optional arguments:

  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.

anvi-interactive

Start an anvi'o server for the interactive interface

Usage

anvi-interactive [-h] [-p PROFILE_DB] [-c CONTIGS_DB]
                 [-C COLLECTION_NAME] [--manual-mode] [-f FASTA]
                 [-d VIEW_DATA] [-t NEWICK] [--items-order FLAT_FILE]
                 [-V ADDITIONAL_VIEW] [-A ADDITIONAL_LAYERS]
                 [--view NAME] [--title NAME]
                 [--taxonomic-level {t_phylum,t_class,t_order,t_family,t_genus,t_species}]
                 [--split-hmm-layers] [--hide-outlier-SNVs]
                 [--state-autoload NAME] [--collection-autoload NAME]
                 [--export-svg FILE_PATH] [--gene-mode] [-b BIN_NAME]
                 [--show-views] [--skip-check-names] [-o DIR_PATH]
                 [--dry-run] [--show-states] [--list-collections]
                 [--skip-init-functions] [--skip-auto-ordering]
                 [--distance DISTANCE_METRIC]
                 [--linkage LINKAGE_METHOD] [-I IP_ADDR] [-P INT]
                 [--browser-path PATH] [--read-only] [--server-only]

Parameters

DEFAULT INPUTS: The interavtive interface can be started with and without anvi'o databases. The default use assumes you have your profile and contigs database, however, it is also possible to start the interface using ad hoc input files. See 'MANUAL INPUT' section for required parameters.

  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        If you have a collection in your profile database, you
                        can use this flag to start the interactive interface
                        with a tree showing your bins in your collection,
                        instead of each split. This is very useful when you
                        have imported your external binning results into
                        anvi'o, and want to see the distribution of your bins
                        across samples. In these cases anvi'o will cluster
                        your bins and based on multiple metrics. Because this
                        particular clustering will be done on the fly within
                        anvi'o interactive class, you get to define a
                        disntance metric and a linkage method using --linkage
                        and --distance parameters if you want!

MANUAL INPUTS: Mandatory input parameters to start the interactive interface without anvi'o databases.

  --manual-mode         Using this flag, you can run the interactive interface
                        in an ad hoc manner using input files you curated
                        instead of standard output files generated by an
                        anvi'o run. In the manual mode you will be asked to
                        provide a profile database. In this mode a profile
                        database is only used to store 'state' of the
                        interactive interface so you can reload your visual
                        settings when you re-analyze the same files again. If
                        the profile database you provide does not exist,
                        anvi'o will create an empty one for you.
  -f FASTA, --fasta-file FASTA
                        A FASTA-formatted input file
  -d VIEW_DATA, --view-data VIEW_DATA
                        A TAB-delimited file for view data
  -t NEWICK, --tree NEWICK
                        NEWICK formatted tree structure
  --items-order FLAT_FILE
                        A flat file that contains the order of items you wish
                        the display using the interactive interface. You may
                        want to use this if you have a specific order of items
                        in your mind, and do not want to display a tree in the
                        middle (or simply you don't have one). The file format
                        is simple: each line should have an item name, and
                        there should be no header.

ADDITIONAL STUFF: Parameters to provide additional layers, views, or layer data.

  -V ADDITIONAL_VIEW, --additional-view ADDITIONAL_VIEW
                        A TAB-delimited file for an additional view to be used
                        in the interface. This file file should contain all
                        split names, and values for each of them in all
                        samples. Each column in this file must correspond to a
                        sample name. Content of this file will be called
                        'user_vuew', which will be available as a new item in
                        the 'views' combo box in the interface
  -A ADDITIONAL_LAYERS, --additional-layers ADDITIONAL_LAYERS
                        A TAB-delimited file for additional layers for splits.
                        The first column of this file must be split names, and
                        the remaining columns should be unique attributes. The
                        file does not need to contain all split names, or
                        values for each split in every column. Anvi'o will try
                        to deal with missing data nicely. Each column in this
                        file will be visualized as a new layer in the tree.

GENE MODE: Gene mode related parameters.

  --gene-mode           Initiate the interactive interface in "gene mode". In
                        this mode, the items are genes (instead of splits of
                        contigs). The following views are avilable: detection
                        (the detection value of each gene in each sample). The
                        mean_coverage (the mean coverage of genes). The
                        non_outlier_mean_coverage (the mean coverage of the
                        non-outlier nucleotide positions of each gene in each
                        sample (median absolute deviation is used to remove
                        outliers per gene per sample)). The
                        non_outlier_coverage_std view (standrad deviation of
                        the coverage of non-outlier positions of genes in
                        samples). You can also choose to order items and
                        layers according to each one of the aforementioned
                        views. In addition, all layer ordering that are
                        avialable in the regular mode (i.e. the full mode
                        where you have contigs/splits) are also available in
                        "gene mode", so that, for example, you can choose to
                        order the layers according to "detection", and that
                        would be the order according to the detection values
                        of splits, whereas if you choose "genes_detections"
                        then the order of layers would be according to the
                        detection values of genes. Inspection and sequence
                        functionality are available (through the right-click
                        menu), except now sequences are of the specific gene.
                        Inspection has now two options available: "Inspect
                        Context", which brings you to the inspection page of
                        the split to which the gene belongs where the
                        inspected gene will be highlighted in yellow in the
                        bottom, and "Inspect Gene", whih opens the inspection
                        page only for the gene and 100 nts around each side of
                        it (the purpose of this option is to make the
                        inspection page load faster if you only want to look
                        at the nucleotide coverage of a specific gene).
                        NOTICE: You can't store states or collections in "gene
                        mode". However, you still can make fake selections,
                        and create fake bins for your viewing covenience only
                        (smiley). Search options are available, and you can
                        even search for functions if you have them in your
                        contigs database. ANOTHER NOTICE: loading this mode
                        might take a while if your bin has many genes, and
                        your profile database has many samples, this is
                        beacause the gene coverages stats are computed in an
                        ad-hoc manner when you load this mode, we know this is
                        not ideal and we plan to improve that (along with
                        other things). If you have suggestions/complaints
                        regarding this mode please comment on this github
                        issue: https://goo.gl/yHhRei. Please refer to the
                        online tutorial for more information.
  -b BIN_NAME, --bin-id BIN_NAME
                        Bin name you are interested in.

VISUALS RELATED: Parameters that give access to various adjustements regarding the interface.

  --view NAME           Start the interface with a pre-selected view. To see a
                        list of available views, use --show-views flag.
  --title NAME          Title for the interface. If you are working with a
                        RUNINFO dict, the title will be determined based on
                        information stored in that file. Regardless, you can
                        override that value using this parameter. If you are
                        not using a anvio RUNINFO dictionary, a meaningful
                        title will appear in the interface only if you define
                        one using this parameter.
  --taxonomic-level {t_phylum,t_class,t_order,t_family,t_genus,t_species}
                        The taxonomic level to use. The default is 't_genus'.
                        Only relevant if the anvi'o ontigs database contains
                        taxonomic annotations.
  --split-hmm-layers    When declared, this flag tells the interface to split
                        every gene found in HMM searches that were performed
                        against non-singlecopy gene HMM profiles into their
                        own layer. Please see the documentation for details.
  --hide-outlier-SNVs   During profiling, anvi'o marks positions of single-
                        nucleotide variations (SNVs) that originate from
                        places in contigs where coverage values are a bit
                        'sketchy'. If you would like to avoid SNVs in those
                        positions of splits in applicable projects you can use
                        this flag, and the interafce would hide SNVs that are
                        marked as 'outlier' (although it is clearly the best
                        to see everything, no one will judge you if you end up
                        using this flag) (plus, there may or may not be some
                        historical data on this here:
                        https://github.com/meren/anvio/issues/309).
  --state-autoload NAME
                        Automatically load previous saved state and draw tree.
                        To see a list of available states, use --show-states
                        flag.
  --collection-autoload NAME
                        Automatically load a collection and draw tree. To see
                        a list of available collections, use --list-
                        collections flag.
  --export-svg FILE_PATH
                        The SVG output file path.

SWEET PARAMS OF CONVENIENCE: Parameters and flags that are not quite essential (but nice to have).

  --show-views          When declared, the program will show a list of
                        available views, and exit.
  --skip-check-names    For debugging purposes. You should never really need
                        it.
  -o DIR_PATH, --output-dir DIR_PATH
                        Directory path for output files
  --dry-run             Don't do anything real. Test everything, and stop
                        right before wherever the developer said 'well, this
                        is enough testing', and decided to print out results.
  --show-states         When declared the program will print all available
                        states and exit.
  --list-collections    Show available collections and exit.
  --skip-init-functions
                        When declared, function calls for genes will not be
                        initialized (therefore will be missing from all
                        relevant interfaces or output files). The use of this
                        flag may reduce the memory fingerprint and processing
                        time for large datasets.
  --skip-auto-ordering  When declared, the attempt to include automatically
                        generated orders of items based on additional data is
                        skipped. In case those buggers cause issues with your
                        data, and you still want to see your stuff and deal
                        with the other issue maybe later.
  --distance DISTANCE_METRIC
                        The distance metric for the hierarchical clustering.
                        Only relevant if you are running the interactive
                        interface in "collection" mode. The default is
                        "euclidean".
  --linkage LINKAGE_METHOD
                        The linkage method for the hierarchical clustering.
                        Only relevant if you are running the interactive
                        interface in "collection" mode. The default is "ward".

SERVER CONFIGURATION: For power users.

  -I IP_ADDR, --ip-address IP_ADDR
                        IP address for the HTTP server. The default ip address
                        (0.0.0.0) should work just fine for most.
  -P INT, --port-number INT
                        Port number to use for anvi'o services. If nothing is
                        declared, anvi'o will try to find a suitable port
                        number, starting from the default port number, 8080.
  --browser-path PATH   By default, anvi'o will use your default browser to
                        launch the interactive interface. If you would like to
                        use something else than your system default, you can
                        provide a full path for an alternative browser using
                        this parameter, and hope for the best. For instance we
                        are using this parameter to call Google's experimental
                        browser, Canary, which performs better with demanding
                        visualizations.
  --read-only           When the interactive interface is started with this
                        flag, all 'database write' operations will be
                        disabled.
  --server-only         The default behavior is to start the local server, and
                        fire up a browser that connects to the server. If you
                        have other plans, and want to start the server without
                        calling the browser, this is the flag you need.

anvi-matrix-to-newick

Takes an observation matrix, returns a newick tree.

Usage

anvi-matrix-to-newick [-h] [-o FILE_PATH] [--transpose]
                      [--distance DISTANCE_METRIC]
                      [--linkage LINKAGE_METHOD]
                      PATH

Parameters

positional arguments:

  PATH                  Input matrix

optional arguments:

  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.
  --transpose           Transpose the input matrix file before clustering.
  --distance DISTANCE_METRIC
                        The distance metric for the hierarchical clustering.
                        The default distance metric is 'euclidean'. You can
                        find the full list of distance metrics either by
                        making a mistake (such as entering a non-existent
                        distance metric and making anvi'o upset), or by taking
                        a look at the help menu of the
                        hierarchy.distance.pdist function in the scipy.cluster
                        module.
  --linkage LINKAGE_METHOD
                        The linkage method for the hierarchical clustering.
                        The default linkage method is 'ward', because that is
                        the best one. It really is. We talked to a lot of
                        people and they were all like 'this is the best one
                        available' and it is just all out there. Honestly it
                        is so good that we will build a wall around it and
                        make other linkage methods pay for it. But if you want
                        to see a full list of available ones you can check the
                        hierarcy.linkage function in the scipy.cluster module.
                        Up tp you really. But then you can't use ward anymore,
                        and you would have to leave anvi'o right now.

anvi-mcg-classifier

A program to classify genes according to coverage across multiple metagenomes

Usage

anvi-mcg-classifier [-h] -p PROFILE_DB -c CONTIGS_DB
                    [-O FILENAME_PREFIX] [-C COLLECTION_NAME]
                    [-b BIN_NAME] [-B FILE_PATH]
                    [--exclude-samples FILE] [--include-samples FILE]
                    [--gen-figures] [-W] [--alpha NUM]
                    [--outliers-threshold NUM] [--zeros-are-outliers]

Parameters

ESSENTIAL INPUTS: You must supply a merged profile db (along with a matching contigs db)

  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'

ESSENTIAL OUTPUTS: The outputs of the algorithm are: an anvio additional layers format file with the classification information for genes. An anvio samples information file with detectino information per sample. In addition, when a profile database is given then a gene-coverages, and gene-detection tables would also be saved. All files are created with the prefix that is provided by the user.

  -O FILENAME_PREFIX, --output-file-prefix FILENAME_PREFIX
                        A prefix to be used while naming the output files (no
                        file type extensions please; just a prefix).
  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        Collection name.

ADDITIONAL STUFF: Parameters to provide pre-existing additional layers, samples-information files, so that the outputs would be added to these files

  -b BIN_NAME, --bin-id BIN_NAME
                        Bin name you are interested in.
  -B FILE_PATH, --bin-ids-file FILE_PATH
                        Text file for bins (each line should be a unique bin
                        id).
  --exclude-samples FILE
                        List of samples to exclude for the analysis.
  --include-samples FILE
                        List of samples to include for the analysis.
  --gen-figures         For those of you who wish to dig deeper, a collection
                        of figures could be created to allow you to get
                        insight into how the classification was generated.
                        This is especially useful to identify cases in which
                        you shouldn't trust the classification (for example
                        due to a large number of outliers). NOTICE: if you ask
                        anvi'o to generate these figures then it will
                        significantly extend the execution time. To learn
                        about which figures are created and what they mean,
                        contact your nearest anvi'o developer, because
                        currently it is a well-hidden secret.
  -W, --overwrite-output-destinations
                        Overwrite if the output files and/or directories
                        exist.

PARAMETERS: Parameters to determine cut-offs for the gene-classifier

  --alpha NUM, --genome-detection-uncertainty NUM
                        Determines the range of sample detection values that
                        are considered negative, ambiguous or positive. Min of
                        0 and smaller than 0.5, default of 0.25. For exmaple
                        for the default samples with detection below 0.5-0.25
                        = 0.25 will be considered negative (i.e. donot contain
                        the genome), samples with detection between 0.25 and
                        0.75 would be ambiguous (and hence would not be used
                        for the classification), and samples with detection
                        above 0.75 would be considered positive (i.e. contain
                        the genome).
  --outliers-threshold NUM
                        Threshold to use for the outlier detection. The
                        default value is 2.5. Absolute deviation around the
                        median is used. To read more about the method please
                        refer to: Boris Iglewicz and David Hoaglin (1993),
                        "Volume 16: How to Detect and Handle Outliers", The
                        ASQC Basic References in Quality Control: Statistical
                        Techniques, Edward F. Mykytka, Ph.D., Editor. Or to: h
                        ttp://www.sciencedirect.com/science/article/pii/S00221
                        03113000668
  --zeros-are-outliers  If you want all zero coverage positions to be treated
                        like outliers then use this flag. The reason to treat
                        zero coverage as outliers is because when mapping
                        reads to a reference we could get many zero positions
                        due to accessory genes. These positions then skew the
                        average values that we compute.

anvi-merge

Merge multiple anvio profiles

Usage

anvi-merge [-h] -c CONTIGS_DB [-o DIR_PATH] [-S NAME]
           [--description TEXT_FILE] [--skip-hierarchical-clustering]
           [--enforce-hierarchical-clustering]
           [--distance DISTANCE_METRIC] [--linkage LINKAGE_METHOD]
           [--skip-concoct-binning] [-W]
           SINGLE_PROFILES) [SINGLE_PROFILE(S ...]

Parameters

positional arguments:

  SINGLE_PROFILE(S)     Anvo'o single profiles to merge

optional arguments:

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -o DIR_PATH, --output-dir DIR_PATH
                        Directory path for output files
  -S NAME, --sample-name NAME
                        It is important to set a sample name (using only ASCII
                        letters and digits and without spaces) that is unique
                        (considering all others). If you do not provide one,
                        anvi'o will try to make up one for you based on other
                        information, although, you should never let the
                        software to decide these things).
  --description TEXT_FILE
                        A plain text file that contains some description about
                        the project. You can use Markdwon syntax. The
                        description text will be rendered and shown in all
                        relevant interfaces, including the anvi'o interactive
                        interface, or anvi'o summary outputs.
  --skip-hierarchical-clustering
                        If you are not planning to use the interactive
                        interface (or if you have other means to add a tree of
                        contigs in the database) you may skip the step where
                        hierarchical clustering of your items are preformed
                        based on default clustering recipes matching to your
                        database type.
  --enforce-hierarchical-clustering
                        If you have more than 25,000 splits in your merged
                        profile, anvi-merge will automatically skip the
                        hierarchical clustering of splits (by setting --skip-
                        hierarchical-clustering flag on). This is due to the
                        fact that computational time required for hierarchical
                        clustering increases exponentially with the number of
                        items being clustered. Based on our experience we
                        decided that 25,000 splits is about the maximum we
                        should try. However, this is not a theoretical limit,
                        and you can overwrite this heuristic by using this
                        flag, which would tell anvi'o to attempt to cluster
                        splits regardless.
  --distance DISTANCE_METRIC
                        The distance metric for the hierarchical clustering.
                        If you do not use this flag, the default distance
                        metric will be used for each clustering configuration
                        which is "euclidean".
  --linkage LINKAGE_METHOD
                        The same story with the `--distance`, except, the
                        system default for this one is ward.
  --skip-concoct-binning
                        Anvi'o uses CONCOCT (Alneberg et al.) by default for
                        unsupervised genome binning for merged runs. CONCOCT
                        results are stored in the profile database, which then
                        can be used from within appropriate interfaces (i.e.,
                        anvi-interactive, anvi-summary, etc). Use this flag if
                        you would like to skip this step
  -W, --overwrite-output-destinations
                        Overwrite if the output files and/or directories
                        exist.

anvi-merge-bins

Merge a given set of bins in an anvi'o collection

Usage

anvi-merge-bins [-h] -p PAN_OR_PROFILE_DB [-C COLLECTION_NAME]
                [-b BIN NAMES] [-B BIN NAME] [--list-collections]
                [--list-bins]

Parameters

DB AND COLLECTION: Simple enough. This guy needs a pan or profile database and a collection name. You can get a list of available collections with another flag down below.

  -p PAN_OR_PROFILE_DB, --pan-or-profile-db PAN_OR_PROFILE_DB
                        Anvi'o pan or profile database
  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        Collection name.

BINS TO WORK WITH: Here you need to define a list of bin names to merge, and the new bin name for them to merge under. Your bin names should be comma-separated. Both 'name_1, name_2, name_3' and name_1,name_2,name_3 will work. Your new bin name better be a single word, meaningful name so anvi'o does not complain about it later.

  -b BIN NAMES, --bin-names-list BIN NAMES
                        Comma-separated list of bin names.
  -B BIN NAME, --new-bin-name BIN NAME
                        The new bin name.

SWEET FLAGS OF CONVENIENCE: We gotchu.

  --list-collections    Show available collections and exit.
  --list-bins           List available bins in a collection and exit.

anvi-meta-pan-genome

Convert a pangenome into a metapangenome.

Usage

anvi-meta-pan-genome [-h] -p PAN_DB [-g GENOMES_STORAGE] [-i FILE]
                     [--fraction-of-median-coverage FLOAT]
                     [--min-detection FLOAT]

Parameters

PANGENOME: Files for the pangenome.

  -p PAN_DB, --pan-db PAN_DB
                        Anvi'o pan database
  -g GENOMES_STORAGE, --genomes-storage GENOMES_STORAGE
                        Anvi'o genomes storage file

METAGENOME: Genome bins stored in an anvi'o profile databases as collections.

  -i FILE, --internal-genomes FILE
                        A four-column TAB-delimited flat text file. This file
                        should be identical to the internal genomes file you
                        used for your pangenomics analysis. Anvi'o will use
                        this file to find all profile and contigs databases
                        that contain the information for each gene and genome
                        across metagenomes.

CRITERION FOR DETECTION: This is tricky. What we want to do is to identify genes that are occurring uniformly across samples.

  --fraction-of-median-coverage FLOAT
                        The value set here will be used to remove a gene if
                        its total coverage across environments is less than
                        the median coverage of all genes multiplied by this
                        value. The default is 0.25, which means, if the median
                        total coverage of all genes across all samples is
                        100X, then, a gene with a total coverage of less than
                        25X across all samples will be assumed not a part of
                        the 'environmental core'.
  --min-detection FLOAT
                        For this entire thing to work, the genome you are
                        focusing on should be detected in at least one
                        metagenome. If that is not the case, it would mean
                        that you do not have any sample that represents the
                        niche for this organism (or you do not have enough
                        depth of coverage) to investigate the detection of
                        genes in the environment. By default, this script
                        requires at least '0.5' of the genome to be detected
                        in at least one metagenome. This parameter allows you
                        to change that. 0 would mean no detection test
                        required, 1 would mean the entire genome must be
                        detected.

anvi-migrate-db

positional arguments: DATABASE Anvi'o database for migration

Usage

anvi-migrate-db [-h] [--just-do-it] [-t VERSION] DATABASE

Parameters

optional arguments:

  --just-do-it          Do not bother me with warnings
  -t VERSION, --target-version VERSION
                        Anvi'o will stop upgrading your database when it
                        reaches to this version.

anvi-oligotype-linkmers

Takes an anvi'o linkmers report, generates an oligotyping output

Usage

anvi-oligotype-linkmers [-h] -i LINKMER_REPORT -o DIR_PATH

Parameters

optional arguments:

  -i LINKMER_REPORT, --input-file LINKMER_REPORT
                        Output file of `anvi-report-linkmers`.
  -o DIR_PATH, --output-dir DIR_PATH
                        Directory path for output files

anvi-pan-genome

A DIAMOND and MCL-based anvi'o pangenome workflow. You provide genomes from anywhere (whether they are external genomes, or anvi'o genome bins in collections), and it gives you back a pangenome analysis.

Usage

anvi-pan-genome [-h] -g GENOMES_STORAGE [-G GENOME_NAMES]
                [--skip-alignments] [--align-with ALIGNER]
                [--exclude-partial-gene-calls] [--use-ncbi-blast]
                [--minbit MINBIT] [--mcl-inflation INFLATION]
                [--min-occurrence NUM_OCCURRENCE]
                [--min-percent-identity PERCENT] [--sensitive]
                [-n PROJECT_NAME] [--description TEXT_FILE]
                [-o DIR_PATH] [-W] [-T NUM_THREADS]
                [--skip-hierarchical-clustering]
                [--enforce-hierarchical-clustering]
                [--distance DISTANCE_METRIC] [--linkage LINKAGE_METHOD]

Parameters

GENOMES: The very fancy genomes storage file. This file is generated by the program anvi-genomes-storage. Please see the online tutorial on pangenomic workflow if you don't know how to generate one.

  -g GENOMES_STORAGE, --genomes-storage GENOMES_STORAGE
                        Anvi'o genomes storage file
  -G GENOME_NAMES, --genome-names GENOME_NAMES
                        Genome names to 'focus'. You can use this parameter to
                        limit the genomes included in your analysis. You can
                        provide these names as a commma-separated list of
                        names, or you can put them in a file, where you have a
                        single genome name in each line, and provide the file
                        path.

PARAMETERS: Important stuff Tom never pays attention (but you should).

  --skip-alignments     By default, anvi'o attempts to align amino acid
                        sequences in each gene cluster using multiple sequnce
                        alignment via muscle. You can use this flag to skip
                        that step and be upset later.
  --align-with ALIGNER  The multiple sequnce alignment program to use when
                        multiple seqeunce alignment is necessary. To see all
                        available optons, use the flag `--list-aligners`.
  --exclude-partial-gene-calls
                        By default, anvi'o includes all partial gene calls
                        from the analysis, which, in some cases, may inflate
                        the number of gene clusters identified and introduce
                        extra heterogeneity within those gene clusters. Using
                        this flag, you can request anvi'o to exclude partial
                        gene calls from the analysis (whether a gene call is
                        partial or not is an information that comes directly
                        from the gene caller used to identify genes during the
                        generation of the contigs database).
  --use-ncbi-blast      This program uses DIAMOND by default, however, if you
                        like, you can use good ol' blastp from NCBI instead.
  --minbit MINBIT       The minimum minbit value. The minbit heuristic
                        provides a mean to set a to eliminate weak matches
                        between two amino acid sequences. We learned it from
                        ITEP (Benedict MN et al, doi:10.1186/1471-2164-15-8),
                        which is a comprehensive analysis workflow for
                        pangenomes, and decided to use it in the anvi'o
                        pangenomic workflow, as well. Briefly, If you have two
                        amino acid sequences, 'A' and 'B', the minbit is
                        defined as 'BITSCORE(A, B) / MIN(BITSCORE(A, A),
                        BITSCORE(B, B))'. So the minbit score between two
                        sequences goes to 1 if they are very similar over the
                        entire length of the 'shorter' amino acid sequence,
                        and goes to 0 if (1) they match over a very short
                        stretch compared even to the length of the shorter
                        amino acid sequence or (2) the match betwen sequence
                        identity is low. The default is 0.5.
  --mcl-inflation INFLATION
                        MCL inflation parameter, that defines the sensitivity
                        of the algorithm during the identification of the gene
                        clusters. More information on this parameter and it's
                        effect on cluster granularity is here:
                        (http://micans.org/mcl/man/mclfaq.html#faq7.2). The
                        default is 2.
  --min-occurrence NUM_OCCURRENCE
                        Do you not want singletons?\ You don't? Well, this
                        parameter will help you get rid of them (along with
                        doubletons, if you want). Anvi'o will remove gene
                        clusters that occur less than the number you set using
                        this parameter from the analysis. The default is 1,
                        which means everything will be kept. If you want to
                        remove singletons, set it to 2, if you want to remove
                        doubletons as well, set it to 3, and so on.
  --min-percent-identity PERCENT
                        Minimum percent identity between the two amino acid
                        sequences for them to have an edge for MCL analysis.
                        This value will be used to filter hits from Diamond
                        search results. Because percent identity is not a
                        predictor of a good match (since it does not
                        communicate many other important factors such as the
                        alignment length between the two sequences and its
                        proportion to the entire length of those involved), we
                        suggest you rely on 'minbit' parameter. But you know
                        what? Maybe you shouldn't listen to anyone, and
                        experiment on your own! The default is 0 percent.
  --sensitive           DIAMOND sensitivity. With this flag you can instruct
                        DIAMOND to be 'sensitive', rather than 'fast' during
                        the search. It is likely the search will take
                        remarkably longer. But, hey, if you are doing it for
                        your final analysis, maybe it should take longer and
                        be more accurate. This flag is only relevant if you
                        are running DIAMOND.

OTHERS: Sweet parameters of convenience.

  -n PROJECT_NAME, --project-name PROJECT_NAME
                        Name of the project. Please choose a short but
                        descriptive name (so anvi'o can use it whenever she
                        needs to name an output file, or add a new table in a
                        database, or name her first born).
  --description TEXT_FILE
                        A plain text file that contains some description about
                        the project. You can use Markdwon syntax. The
                        description text will be rendered and shown in all
                        relevant interfaces, including the anvi'o interactive
                        interface, or anvi'o summary outputs.
  -o DIR_PATH, --output-dir DIR_PATH
                        Directory path for output files
  -W, --overwrite-output-destinations
                        Overwrite if the output files and/or directories
                        exist.
  -T NUM_THREADS, --num-threads NUM_THREADS
                        Maximum number of threads to use for multithreading
                        whenever possible. Very conservatively, the default is
                        1. It is a good idea to not exceed the number of CPUs
                        / cores on your system. Plus, please be careful with
                        this option if you are running your commands on a SGE
                        --if you are clusterizing your runs, and asking for
                        multiple threads to use, you may deplete your
                        resources very fast.

ORGANIZING GENE CLUSTERs: These are stuff that will change the clustering dendrogram of your gene clusters.

  --skip-hierarchical-clustering
                        Anvi'o attempts to generate a hierarchical clustering
                        of your gene clusters once it identifies them so you
                        can use `anvi-display-pan` to play with it. But if you
                        want to skip this step, this is your flag.
  --enforce-hierarchical-clustering
                        If you want anvi'o to try to generate a hierarchical
                        clustering of your gene clusters even if the number of
                        gene clusters exceeds its suggested limit for
                        hierarchical clustering, you can use this flag to
                        enforce it. Are you are a rebel of some sorts? Or did
                        computers made you upset? Express your anger towards
                        machine using this flag.
  --distance DISTANCE_METRIC
                        The distance metric for the clustering of gene
                        clusters. If you do not use this flag, the default
                        distance metric will be used for each clustering
                        configuration which is "euclidean".
  --linkage LINKAGE_METHOD
                        The same story with the `--distance`, except, the
                        system default for this one is ward.

anvi-profile

Main entry point for Post-Assembly Metagenomics Pipeline

metagenomics profile_db contigs_db bam variability clustering

Example uses and other resources

Usage

anvi-profile [-h] [-i INPUT_BAM] [-c CONTIGS_DB] [--blank-profile]
             [-o DIR_PATH] [-W] [-S NAME] [--report-variability-full]
             [--skip-SNV-profiling] [--profile-AA-frequencies]
             [--description TEXT_FILE] [--cluster-contigs]
             [--skip-hierarchical-clustering]
             [--distance DISTANCE_METRIC] [--linkage LINKAGE_METHOD]
             [-M INT] [-X INT] [-V INT] [--list-contigs]
             [--contigs-of-interest FILE] [-T NUM_THREADS]
             [--queue-size INT] [--write-buffer-size INT]

Parameters

INPUTS: There are two possible inputs for anvio profiler. You must to declare either of these two.

  -i INPUT_BAM, --input-file INPUT_BAM
                        Sorted and indexed BAM file to analyze. Takes a long
                        time depending on the length of the file and
                        parameters used for profiling.
  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  --blank-profile       If you only have contig sequences, but no mapping data
                        (i.e., you found a genome and would like to take a
                        look from it), this flag will become very hand. After
                        creating a contigs database for your contigs, you can
                        create a blank anvi'o profile database to use anvi'o
                        interactive interface with that contigs database
                        without any mapping data.

EXTRAS: Things that are not mandatory, but can be useful if/when declared.

  -o DIR_PATH, --output-dir DIR_PATH
                        Directory path for output files
  -W, --overwrite-output-destinations
                        Overwrite if the output files and/or directories
                        exist.
  -S NAME, --sample-name NAME
                        It is important to set a sample name (using only ASCII
                        letters and digits and without spaces) that is unique
                        (considering all others). If you do not provide one,
                        anvi'o will try to make up one for you based on other
                        information, although, you should never let the
                        software to decide these things).
  --report-variability-full
                        One of the things anvi-profile does is to store
                        information about variable nucleotide positions.
                        Usually it does not report every variable position,
                        since not every variable position is geniune
                        variation. Say, if you have 1,000 coverage, and all
                        nucleotides at that position are Ts and only one of
                        them is a C, the confidence of that C being a real
                        variation is quite low. anvio has a simple algorithm
                        in place to reduce the impact of noise. However, using
                        this flag you can diable it and ask profiler to report
                        every single variation (which may result in very large
                        output files and millions of reports, but you are the
                        boss). Do not forget to take a look at '--min-
                        coverage-for-variability' parameter
  --skip-SNV-profiling  By default, anvi'o characterizes single-nucleotide
                        variation in each sample. The use of this flag will
                        instruct profiler to skip that step. Please remember
                        that parameters and flags must be identical between
                        different profiles using the same contigs database for
                        them to merge properly.
  --profile-AA-frequencies
                        Anvi'o can characterize linkmer frequencies for AA
                        distribution in genes in contigs during profiling.
                        However, due to its computational complexity, this
                        feature is by default off. Using this flag you can go
                        against the authority, and make anvi'o do it. Please
                        remember that this functionality is available only if
                        genes calls are present in contigs database.
  --description TEXT_FILE
                        A plain text file that contains some description about
                        the project. You can use Markdwon syntax. The
                        description text will be rendered and shown in all
                        relevant interfaces, including the anvi'o interactive
                        interface, or anvi'o summary outputs.

HIERARCHICAL CLUSTERING: Do you want your splits to be clustered? Yes? No? Maybe? Remember: By default, anvi-profile will not perform hierarchical clustering on your splits; but if you use --blank flag, it will try. You can skip that by using the --skip-hierarchical-clustering flag.

  --cluster-contigs     Single profiles are rarely used for genome binning or
                        visualization, and since clustering step increases the
                        profiling runtime for no good reason, the default
                        behavior is to not cluster contigs for individual
                        runs. However, if you are planning to do binning on
                        one sample, you must use this flag to tell anvio to
                        run cluster configurations for single runs on your
                        sample.
  --skip-hierarchical-clustering
                        If you are not planning to use the interactive
                        interface (or if you have other means to add a tree of
                        contigs in the database) you may skip the step where
                        hierarchical clustering of your items are preformed
                        based on default clustering recipes matching to your
                        database type.
  --distance DISTANCE_METRIC
                        The distance metric for the hierarchical clustering.
                        Only relevant if you are using `--cluster-contigs`
                        flag. The default is "euclidean".
  --linkage LINKAGE_METHOD
                        The linkage method for the hierarchical clustering.
                        Just like the distance metric this is only relevant if
                        you are using it with `--cluster-contigs` flag. The
                        default is "ward".

NUMBERS: Defaults of these parameters will impact your analysis. You can always come back to them and update your profiles, but it is important to make sure defaults are reasonable for your sample.

  -M INT, --min-contig-length INT
                        Minimum length of contigs in a BAM file to analyze.
                        The minimum length should be long enough for tetra-
                        nucleotide frequency analysis to be meaningful. There
                        is no way to define a golden number of minumum length
                        that would be applicable to genomes found in all
                        environments, but we chose the default to be 2500, and
                        have been happy with it. You are welcome to
                        experiment, but we advise to never go below 1,000. You
                        also should remember that the lower you go, the more
                        time it will take to analyze all contigs. You can use
                        --list-contigs parameter to have an idea how many
                        contigs would be discarded for a given M.
  -X INT, --min-mean-coverage INT
                        Minimum mean coverage for contigs to be kept in the
                        analysis. The default value is 0, which is for your
                        best interest if you are going to profile muptiple BAM
                        files which are then going to be merged for a cross-
                        sectional or time series analysis. Do not change it if
                        you are not sure this is what you want to do.
  -V INT, --min-coverage-for-variability INT
                        Minimum coverage of a nucleotide position to be
                        subjected to SNV profiling. By default, anvio will not
                        attempt to make sense of variation in a given
                        nucleotide position if it is covered less than 10X.
                        You can change that minimum using this parameter.

CONTIGS: Sweet parameters of convenience

  --list-contigs        When declared, the program will list contigs in the
                        BAM file and exit gracefully without any further
                        analysis.
  --contigs-of-interest FILE
                        It is possible to analyze only a group of contigs from
                        a given BAM file. If you provide a text file, in which
                        every contig of interest is listed line by line, the
                        profiler would engine only on those contigs in the BAM
                        file and ignore the rest. This can be used for
                        debugging purposes, or to engine on a particular group
                        of contigs that were identified as relevant during the
                        interactive analysis.

PERFORMANCE: Performance settings for profiler

  -T NUM_THREADS, --num-threads NUM_THREADS
                        Maximum number of threads to use for multithreading
                        whenever possible. Very conservatively, the default is
                        1. It is a good idea to not exceed the number of CPUs
                        / cores on your system. Plus, please be careful with
                        this option if you are running your commands on a SGE
                        --if you are clusterizing your runs, and asking for
                        multiple threads to use, you may deplete your
                        resources very fast.
  --queue-size INT      The queue size for worker threads to store data to
                        communicate to the main thread. The default is set by
                        the class based on the number of threads. If you have
                        *any* hesitation about whther you know what you are
                        doing, you should not change this value.
  --write-buffer-size INT
                        How many items should be kept in memory before they
                        are written do the disk. The default is 500. The
                        larger the buffer size, the less frequent the program
                        will access to the disk, yet the more memory will be
                        consumed since the processed items will be cleared off
                        the memory only after they are written to the disk.
                        The default buffer size will likely work for most
                        cases, but if you have very large contigs, you may
                        need to decrease this value. Please keep an eye on the
                        memory usage output to make sure the memory use never
                        exceeds the size of the physical memory.

anvi-push

Push stuff to an anvi'server

Usage

anvi-push [-h] --user USERNAME [--api-url API_URL] -n PROJECT_NAME
          [-t NEWICK] [--items-order FLAT_FILE] [-f FASTA]
          [-d VIEW_DATA] [-A ADDITIONAL_LAYERS] [-s STATE]
          [--description TEXT_FILE] [--bins BINS_DATA]
          [--bins-info BINS_INFO] [-D FILE] [-R FILE]
          [--delete-if-exists]

Parameters

SERVER DETAILS: Details of how to access to an anvi'server instance.

  --user USERNAME       The user for an anvi'server.
  --api-url API_URL     Anvi'server url

PROJECT DETAILS: What to send to the server

  -n PROJECT_NAME, --project-name PROJECT_NAME
                        Name of the project. Please choose a short but
                        descriptive name (so anvi'o can use it whenever she
                        needs to name an output file, or add a new table in a
                        database, or name her first born).
  -t NEWICK, --tree NEWICK
                        NEWICK formatted tree structure
  --items-order FLAT_FILE
                        A flat file that contains the order of items you wish
                        the display using the interactive interface. You may
                        want to use this if you have a specific order of items
                        in your mind, and do not want to display a tree in the
                        middle (or simply you don't have one). The file format
                        is simple: each line should have an item name, and
                        there should be no header.
  -f FASTA, --fasta-file FASTA
                        A FASTA-formatted input file
  -d VIEW_DATA, --view-data VIEW_DATA
                        A TAB-delimited file for view data
  -A ADDITIONAL_LAYERS, --additional-layers ADDITIONAL_LAYERS
                        A TAB-delimited file for additional layers for splits.
                        The first column of this file must be split names, and
                        the remaining columns should be unique attributes. The
                        file does not need to contain all split names, or
                        values for each split in every column. Anvi'o will try
                        to deal with missing data nicely. Each column in this
                        file will be visualized as a new layer in the tree.
  -s STATE, --state STATE
                        State file, you can export states from database using
                        anvi-export-state program
  --description TEXT_FILE
                        A plain text file that contains some description about
                        the project. You can use Markdwon syntax. The
                        description text will be rendered and shown in all
                        relevant interfaces, including the anvi'o interactive
                        interface, or anvi'o summary outputs.
  --bins BINS_DATA      Tab-delimited file, first column contains tree leaves
                        (gene clusters, splits, contigs etc.) and second
                        column contains which Bin they belong.
  --bins-info BINS_INFO
                        Additional information for bins. The file must contain
                        three TAB-delimited columns, where the first one must
                        be a unique bin name, the second should be a 'source',
                        and the last one should be a 7 character HTML color
                        code (i.e., '#424242'). Source column must contain
                        information about the origin of the bin. If these bins
                        are automatically identified by a program like
                        CONCOCT, this column could contain the program name
                        and version. The source information will be associated
                        with the bin in various interfaces so in a sense it is
                        not *that* critical what it says there, but on the
                        other hand it is, becuse we should also think about
                        people who may end up having to work with what we put
                        together later.
  -D FILE, --layers-information-file FILE
                        A TAB-delimited file with information about layers in
                        your dataset. Each row in this file must correspond to
                        a sample name. Each column must contain a unique
                        attribute. Please refer to the documentation to learn
                        more about the structure and purpose of this file.
  -R FILE, --layers-order-file FILE
                        A TAB-delimited file with three columns: 'attribute',
                        'basic', 'newick'. For each attribute, the order of
                        samples must be defined either in the 'basic' form or
                        via a 'newick'-formatted tree structurei that
                        describes the organization of each sample. Anvi'o will
                        look for a comma-separated list of sample names for
                        the 'basic' form. Please refer to the online docs for
                        more info. Also you shouldn't hesitate to try to find
                        the right file format until you get it working. There
                        are stringent checks on this file, and you will not
                        break anything while trying!.

RISKY CLICKS: As the name suggests!

  --delete-if-exists    Be bold (at your own risk), and delete if exists.

anvi-refine

Start the anvi'o interactive interactive for refining

Usage

anvi-refine [-h] -p PROFILE_DB -c CONTIGS_DB [-C COLLECTION_NAME]
            [-b BIN_NAME] [-B FILE_PATH] [-V ADDITIONAL_VIEW]
            [-A ADDITIONAL_LAYERS] [--split-hmm-layers]
            [--taxonomic-level {t_phylum,t_class,t_order,t_family,t_genus,t_species}]
            [--hide-outlier-SNVs] [--title NAME]
            [--export-svg FILE_PATH] [--dry-run]
            [--skip-init-functions] [--skip-auto-ordering] [-I IP_ADDR]
            [-P INT] [--browser-path PATH] [--read-only]
            [--server-only]

Parameters

DEFAULT INPUTS: The interavtive interface can be started with and without anvi'o databases. The default use assumes you have your profile and contigs database, however, it is also possible to start the interface using ad-hoc input files. See 'MANUAL INPUT' section for other set of parameters that are mutually exclusive with datanases.

  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'

REFINE-SPECIFICS: Parameters that are essential to the refinement process.

  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        Collection name.
  -b BIN_NAME, --bin-id BIN_NAME
                        Bin name you are interested in.
  -B FILE_PATH, --bin-ids-file FILE_PATH
                        Text file for bins (each line should be a unique bin
                        id).

ADDITIONAL STUFF: Parameters to provide additional layers, views, or layer data.

  -V ADDITIONAL_VIEW, --additional-view ADDITIONAL_VIEW
                        A TAB-delimited file for an additional view to be used
                        in the interface. This file file should contain all
                        split names, and values for each of them in all
                        samples. Each column in this file must correspond to a
                        sample name. Content of this file will be called
                        'user_vuew', which will be available as a new item in
                        the 'views' combo box in the interface
  -A ADDITIONAL_LAYERS, --additional-layers ADDITIONAL_LAYERS
                        A TAB-delimited file for additional layers for splits.
                        The first column of this file must be split names, and
                        the remaining columns should be unique attributes. The
                        file does not need to contain all split names, or
                        values for each split in every column. Anvi'o will try
                        to deal with missing data nicely. Each column in this
                        file will be visualized as a new layer in the tree.

VISUALS RELATED: Parameters that give access to various adjustements regarding the interface.

  --split-hmm-layers    When declared, this flag tells the interface to split
                        every gene found in HMM searches that were performed
                        against non-singlecopy gene HMM profiles into their
                        own layer. Please see the documentation for details.
  --taxonomic-level {t_phylum,t_class,t_order,t_family,t_genus,t_species}
                        The taxonomic level to use. The default is 't_genus'.
                        Only relevant if the anvi'o ontigs database contains
                        taxonomic annotations.
  --hide-outlier-SNVs   During profiling, anvi'o marks positions of single-
                        nucleotide variations (SNVs) that originate from
                        places in contigs where coverage values are a bit
                        'sketchy'. If you would like to avoid SNVs in those
                        positions of splits in applicable projects you can use
                        this flag, and the interafce would hide SNVs that are
                        marked as 'outlier' (although it is clearly the best
                        to see everything, no one will judge you if you end up
                        using this flag) (plus, there may or may not be some
                        historical data on this here:
                        https://github.com/meren/anvio/issues/309).
  --title NAME          Title for the interface. If you are working with a
                        RUNINFO dict, the title will be determined based on
                        information stored in that file. Regardless, you can
                        override that value using this parameter. If you are
                        not using a anvio RUNINFO dictionary, a meaningful
                        title will appear in the interface only if you define
                        one using this parameter.
  --export-svg FILE_PATH
                        The SVG output file path.

SWEET PARAMS OF CONVENIENCE: Parameters and flags that are not quite essential (but nice to have).

  --dry-run             Don't do anything real. Test everything, and stop
                        right before wherever the developer said 'well, this
                        is enough testing', and decided to print out results.
  --skip-init-functions
                        When declared, function calls for genes will not be
                        initialized (therefore will be missing from all
                        relevant interfaces or output files). The use of this
                        flag may reduce the memory fingerprint and processing
                        time for large datasets.
  --skip-auto-ordering  When declared, the attempt to include automatically
                        generated orders of items based on additional data is
                        skipped. In case those buggers cause issues with your
                        data, and you still want to see your stuff and deal
                        with the other issue maybe later.

SERVER CONFIGURATION: For power users.

  -I IP_ADDR, --ip-address IP_ADDR
                        IP address for the HTTP server. The default ip address
                        (0.0.0.0) should work just fine for most.
  -P INT, --port-number INT
                        Port number to use for anvi'o services. If nothing is
                        declared, anvi'o will try to find a suitable port
                        number, starting from the default port number, 8080.
  --browser-path PATH   By default, anvi'o will use your default browser to
                        launch the interactive interface. If you would like to
                        use something else than your system default, you can
                        provide a full path for an alternative browser using
                        this parameter, and hope for the best. For instance we
                        are using this parameter to call Google's experimental
                        browser, Canary, which performs better with demanding
                        visualizations.
  --read-only           When the interactive interface is started with this
                        flag, all 'database write' operations will be
                        disabled.
  --server-only         The default behavior is to start the local server, and
                        fire up a browser that connects to the server. If you
                        have other plans, and want to start the server without
                        calling the browser, this is the flag you need.

anvi-rename-bins

Rename all bins in a given collection (so they have pretty names).

Usage

anvi-rename-bins [-h] -c CONTIGS_DB -p PROFILE_DB
                 [--collection-to-read COLLECTION_TO_READ]
                 [--collection-to-write COLLECTION_TO_WRITE]
                 [--prefix PREFIX] [--report-file REPORT_FILE_PATH]
                 [--list-collections] [--dry-run] [--call-MAGs]
                 [--min-completion-for-MAG [0-100]]
                 [--max-redundancy-for-MAG [0-100]]
                 [--size-for-MAG 0.1-10 Mbp]

Parameters

DEFAULT INPUTS: Standard stuff

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  --collection-to-read COLLECTION_TO_READ
                        Collection name to read from. Anvi'o will not
                        overwrite an existing collection, instead, it will
                        create a copy of your collection with new bin names.
  --collection-to-write COLLECTION_TO_WRITE
                        The new collection name. Give it a nice, fancy name.

OUTPUT AND TESTING: a.k.a, sweet parameters of convenience

  --prefix PREFIX       Prefix for the bin names. Must be a single word,
                        composed of digits and numbers. The use of the
                        underscore character is OK, but that's about it (fine,
                        the use of the dash character is OK, too but no
                        more!). If the prefix is 'PREFIX', each bin will be
                        renamed as 'PREFIX_XXX_00001, PREFIX_XXX_00002', and
                        so on, in the order of percent completion minus
                        percent redundancy (what we call, 'substantive
                        completion'). The 'XXX' part will either be 'Bin', or
                        'MAG depending on other parameters you use. Keep
                        reading.
  --report-file REPORT_FILE_PATH
                        This file will report each name change event, so you
                        can trace back the original names of renamed bins
                        later.
  --list-collections    Show available collections and exit.
  --dry-run             When used does NOT update the profile database, just
                        creates the report file so you can view how things
                        will be renamed.

MAG OPTIONS: If you want to call some bins 'MAGs' because you are so cool

  --call-MAGs           This program by default rename your bins as
                        'PREFIX_Bin_00001', 'PREFIX_Bin_00002' and so on. If
                        you use this flag, it will name the ones that meet the
                        criteria described by MAG-related flags as
                        'PREFIX_MAG_00001', 'PREFIX_MAG_00002', and so on. The
                        ones that do not get to be named as MAGs will remain
                        as bins.
  --min-completion-for-MAG [0-100]
                        If --call-MAGs flag is used, call any bin a 'MAG' if
                        their completion estimate is above this (the default
                        is 70), and the redundancy estimate is less than
                        --max-redundancy-for-MAG.
  --max-redundancy-for-MAG [0-100]
                        If --call-MAGs flag is used, call any bin a 'MAG' if
                        their redundancy estimate is below this (the default
                        is 10) and the completion estimate is above --min-
                        completion-for-MAG.
  --size-for-MAG 0.1-10 Mbp
                        If --call-MAGs flag is used, call any bin a 'MAG' if
                        their redundancy estimate is less than --max-
                        redundancy-for-MAG, and the size is larger than this
                        (the default is 2 Mbp), regarldless of the completion.

anvi-report-linkmers

Access reads in contigs and positions in a BAM file

Usage

anvi-report-linkmers [-h] -i INPUT_BAMS) [INPUT_BAM(S ...]
                     --contigs-and-positions CONTIGS_AND_POS
                     [--only-complete-links] -o FILE_PATH
                     [--list-contigs]

Parameters

optional arguments:

  -i INPUT_BAM(S) [INPUT_BAM(S) ...], --input-files INPUT_BAM(S) [INPUT_BAM(S) ...]
                        Sorted and indexed BAM files to analyze. It is
                        essential that all BAM files must be the result of
                        mappings against the same contigs.
  --contigs-and-positions CONTIGS_AND_POS
                        This is the file where you list the contigs, and
                        nucleotide positions you are interested in. This is
                        supposed to be a TAB-delimited file with two columns.
                        In each line, the first column should be the contig
                        name, and the second column should be the comma-
                        separated list of integers for nucleotide positions.
  --only-complete-links
                        When declared, only reads that cover all positions
                        will be reported. It is necessary to use this flag if
                        you want to perform oligotyping-like analyses on
                        matching reads.
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.
  --list-contigs        When declared, the program will list contigs in the
                        BAM file and exit gracefully without any further
                        analysis.

anvi-run-hmms

This program deals with populating tables that store HMM hits in an anvi'o contigs database.

Usage

anvi-run-hmms [-h] -c CONTIGS_DB [-H HMM PROFILE PATH]
              [-I HMM PROFILE NAME] [-T NUM_THREADS]

Parameters

optional arguments:

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -H HMM PROFILE PATH, --hmm-profile-dir HMM PROFILE PATH
                        You can use this parameter you can specify a directory
                        path that contain an HMM profile. This way you can run
                        HMM profiles that are not included in anvi'o. See the
                        online to find out about the specifics of this
                        directory structure .
  -I HMM PROFILE NAME, --installed-hmm-profile HMM PROFILE NAME
                        When you run this program without any parameter, it
                        runs all 3 HMM profiles installed on your system. If
                        you want only a specific one to run, you can select it
                        by using this parameter. These are the currently
                        available ones: "Rinke_et_al" (type: singlecopy),
                        "Campbell_et_al" (type: singlecopy), "Ribosomal_RNAs"
                        (type: Ribosomal_RNAs).
  -T NUM_THREADS, --num-threads NUM_THREADS
                        Maximum number of threads to use for multithreading
                        whenever possible. Very conservatively, the default is
                        1. It is a good idea to not exceed the number of CPUs
                        / cores on your system. Plus, please be careful with
                        this option if you are running your commands on a SGE
                        --if you are clusterizing your runs, and asking for
                        multiple threads to use, you may deplete your
                        resources very fast.

anvi-run-ncbi-cogs

Run NCBI COGs on stuff.

Usage

anvi-run-ncbi-cogs [-h] -c CONTIGS_DB [--cog-data-dir COG_DATA_DIR]
                   [-T NUM_THREADS] [--sensitive]
                   [--temporary-dir-path PATH] [--search-with PROGRAM]

Parameters

optional arguments:

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  --cog-data-dir COG_DATA_DIR
                        The directory path for your COG setup. Anvi'o will try
                        to use the default path if you do not specify
                        anything.
  -T NUM_THREADS, --num-threads NUM_THREADS
                        Maximum number of threads to use for multithreading
                        whenever possible. Very conservatively, the default is
                        1. It is a good idea to not exceed the number of CPUs
                        / cores on your system. Plus, please be careful with
                        this option if you are running your commands on a SGE
                        --if you are clusterizing your runs, and asking for
                        multiple threads to use, you may deplete your
                        resources very fast.
  --sensitive           DIAMOND sensitivity. With this flag you can instruct
                        DIAMOND to be 'sensitive', rather than 'fast' during
                        the search. It is likely the search will take
                        remarkably longer. But, hey, if you are doing it for
                        your final analysis, maybe it should take longer and
                        be more accurate. This flag is only relevant if you
                        are running DIAMOND.
  --temporary-dir-path PATH
                        If you don't provide anything here, this program will
                        come up with a temporary directory path by itself to
                        store intermediate files, and clean it later. If you
                        want to have full control over this, you can use this
                        flag to define one..
  --search-with PROGRAM
                        What program to use for database searching. The
                        default is 'diamond', but you can also use NCBI's
                        'blastp' if you like.

anvi-saavs-and-protein-structures-summary

Generate a static web site for SAAVs and protein structures.

Usage

anvi-saavs-and-protein-structures-summary [-h] [-c CONTIGS_DB]
                                          [--genes GENES]
                                          [--samples SAMPLES] -i
                                          DIR_PATH -o DIR_PATH
                                          [--soft-link-images]
                                          [--perspectives PERSPECTIVES]

Parameters

CONTIGS DB: If you provide a contigs database, anvi'o will findout about functions and other properties of genes using the contigs database. This is supposed to be the contigs database you used to generate variability profile for this project like 2 years ago. Yeah. Time goes by :/

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'

WHAT SHOULD BE PROCESSED: By default, anvi'o will learn about the genes and samples you have from the input data directory. If you want to overwrite that information (i.e. to work with a smaller set of genes or samples), you can come up with your own files.

  --genes GENES         Genes file.
  --samples SAMPLES     Samples file.

INPUT/OUTPUT: Read from here, write to there.

  -i DIR_PATH, --input-dir DIR_PATH
                        Directory path for input files
  -o DIR_PATH, --output-dir DIR_PATH
                        Directory path for output files

ADDITIONAL STUFF: Little conveniences.

  --soft-link-images    By default, your imaeges will be copied in the output
                        directory to create a fully self-contained output (so
                        you can send it to your colleagues and they would have
                        everything they need to browse the output).
                        Alternatively you can use this flag to avoid copying
                        images in the output directory (it would make the
                        output less portable, but it would take less time and
                        space to generate it).
  --perspectives PERSPECTIVES
                        By default anvi'o will use each perspective found in
                        the data directory to create an HTML output. Using
                        this parameter you can limit perspectives to the ones
                        you are interested in by defining them as a commma-
                        separated list. If you make a mistake anvi'o will tell
                        you what are the available perspectives, so don't
                        worry.

anvi-search-functions

Search functions in an anvi'o contigs database or genomes storage. Basically, this program searches for one or more search terms you define in functional annotations of genes in an anvi'o contigs database, and generates multiple reports. The simpler report (which also is the default one) simply tells you which contigs contain genes with functions matching to serach terms you used. This file is only useful to quickly highlight matching contigs in the interface by providing it to the anvi-interactive with the --additional- layer parameter. You can also request a much more comprehensive report, which gives you anything you might need to know, including the matching gene caller id, functional annotation source, and full function name for each hit and serach term.

Usage

anvi-search-functions [-h] [-c CONTIGS_DB] [-p PAN_DB]
                      [-g GENOMES_STORAGE] --search-terms SEARCH_TERMS
                      [--delimiter CHAR]
                      [--annotation-sources SOURCE NAME[S]] [-l]
                      [-o FILE_PATH] [--full-report FILE_NAME]
                      [--include-sequences] [--verbose]

Parameters

SEARCH IN: Relevant source databases

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -p PAN_DB, --pan-db PAN_DB
                        Anvi'o pan database
  -g GENOMES_STORAGE, --genomes-storage GENOMES_STORAGE
                        Anvi'o genomes storage file

SEARCH FOR: Relevant terms

  --search-terms SEARCH_TERMS
                        Search terms. Multiple of them can be declared
                        separated by a delimiter (the default is a comma).
  --delimiter CHAR      The delimiter to parse multiple input terms. The
                        default is ','.
  --annotation-sources SOURCE NAME[S]
                        Get functional annotations for a specific list of
                        annotation sources. You can specifiy one or more
                        sources by separating them from each other with a
                        comma character (i.e., '--annotation-sources
                        source_1,source_2,source_3'). The default behavior is
                        to return everything
  -l, --list-annotation-sources
                        List available sources for annotation in the contigs
                        database and quit.

REPORT: Anvi'o can report the hits in multiple ways. The output file will be a very simple 2-column TAB-delimited output that is compatible with anvi'o additional data format (so you can give it to the anvi-interactive to see which splits contained genes that were matching to your search terms). You can also ask anvi'o to generate a full-report, that contains much more and much helpful information about each hit. Optionally you can even ask the gene seqeunces to appear in this report.

  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.
  --full-report FILE_NAME
                        Optional output file with a fuller description of
                        findings.
  --include-sequences   Include sequences in the report.
  --verbose             Be verbose, print more messages whenever possible.

anvi-self-test

A script for anvi'o to test itself

Usage

anvi-self-test [-h] [--suite SUITE]

Parameters

optional arguments:

  --suite SUITE  Suite of tests to execute. By default this program will
                 execute a full suite of example anvi'o commands to ensure
                 your installation is ready to run all scenarios anvi'o
                 developers could think of. Alternatively you can choose a
                 specific test to run. Here is a full list of available
                 options: mini, full, pangenomics, alons-classifier.

anvi-setup-ncbi-cogs

Download COG data from the NCBI

Usage

anvi-setup-ncbi-cogs [-h] [--cog-data-dir COG_DATA_DIR] [--reset]
                     [--just-do-it] [-T NUM_THREADS]

Parameters

optional arguments:

  --cog-data-dir COG_DATA_DIR
                        The directory for COG data to be stored. If you leave
                        it as is without specifying anything, the default
                        destination for the data directory will be used to set
                        things up. The advantage of it is that everyone will
                        be using a single data directory, but then you may
                        need superuser privileges to do it. Using this
                        parameter you can choose the location of the data
                        directory somewhere you like. However, when it is time
                        to run COGs, you will need to remember that path and
                        provide it to the program.
  --reset               This program by default attempts to use previously
                        downloaded files in your COGs data directory if there
                        are any. If something is wrong for some reason you can
                        use this to tell anvi'o to remove everything, and
                        start over.
  --just-do-it          Don't bother me with questions or warnings, just do
                        it.
  -T NUM_THREADS, --num-threads NUM_THREADS
                        Maximum number of threads to use for multithreading
                        whenever possible. Very conservatively, the default is
                        1. It is a good idea to not exceed the number of CPUs
                        / cores on your system. Plus, please be careful with
                        this option if you are running your commands on a SGE
                        --if you are clusterizing your runs, and asking for
                        multiple threads to use, you may deplete your
                        resources very fast.

anvi-show-collections-and-bins

A script to display collections stored in an anvi'o profile or pan database.

Usage

anvi-show-collections-and-bins [-h] -p PAN_OR_PROFILE_DB

Parameters

optional arguments:

  -p PAN_OR_PROFILE_DB, --pan-or-profile-db PAN_OR_PROFILE_DB
                        Anvi'o pan or profile database

anvi-show-misc-data

Show all misc data keys in all misc data tables

Usage

anvi-show-misc-data [-h] -p PAN_OR_PROFILE_DB

Parameters

optional arguments:

  -p PAN_OR_PROFILE_DB, --pan-or-profile-db PAN_OR_PROFILE_DB
                        Anvi'o pan or profile database

anvi-split

Split an anvi'o profile into smaller profiles. This is usually great when you want to share a subset of an anvi'o profile. You give this guy an anvi'o profile databsae, a contigs database, and a collection id, and it gives you back directories of profiles for each bin that can be treated as individual anvi'o profiles.

Usage

anvi-split [-h] -p PROFILE_DB -c CONTIGS_DB [-C COLLECTION_NAME]
           [-b BIN_NAME] [-o DIR_PATH] [--list-collections]
           [--skip-hierarchical-clustering] [--compress-auxiliary-data]
           [--enforce-hierarchical-clustering]
           [--distance DISTANCE_METRIC] [--linkage LINKAGE_METHOD]

Parameters

DATABASES: Declaring relevant anvi'o databases. First things first.

  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'

COLLECTION: You should provide a valid collection name. If you do not provide bin names, the program will generate an output for each bin in your collection separately.

  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        Collection name.
  -b BIN_NAME, --bin-id BIN_NAME
                        Bin name you are interested in.

OUTPUT: Where do we want the resulting split profiles to be stored.

  -o DIR_PATH, --output-dir DIR_PATH
                        Directory path for output files

EXTRAS: Stuff that you rarely need, but you really really need when the time comes. Following parameters will aply to each of the resulting anvi'o profile that will be split from the mother anvi'o profile.

  --list-collections    Show available collections and exit.
  --skip-hierarchical-clustering
                        If you are not planning to use the interactive
                        interface (or if you have other means to add a tree of
                        contigs in the database) you may skip the step where
                        hierarchical clustering of your items are preformed
                        based on default clustering recipes matching to your
                        database type.
  --compress-auxiliary-data
                        When declared, the auxiliary data file in the
                        resulting output will be compressed. This saves space,
                        but it takes long. Also, if you are planning to
                        compress the entire later using GZIP, it is even
                        useless to do. But you are the boss!
  --enforce-hierarchical-clustering
                        If you have more than 25,000 splits in your merged
                        profile, anvi-merge will automatically skip the
                        hierarchical clustering of splits (by setting --skip-
                        hierarchical-clustering flag on). This is due to the
                        fact that computational time required for hierarchical
                        clustering increases exponentially with the number of
                        items being clustered. Based on our experience we
                        decided that 25,000 splits is about the maximum we
                        should try. However, this is not a theoretical limit,
                        and you can overwrite this heuristic by using this
                        flag, which would tell anvi'o to attempt to cluster
                        splits regardless.
  --distance DISTANCE_METRIC
                        The distance metric for the hierarchical clustering.
                        If you do not use this flag, the default distance
                        metric will be used for each clustering configuration
                        which is "euclidean".
  --linkage LINKAGE_METHOD
                        The same story with the `--distance`, except, the
                        system default for this one is ward.

anvi-summarize

Summarize an anvi'o collection. Fun stuff.

Usage

anvi-summarize [-h] -p PAN_OR_PROFILE_DB [-c CONTIGS_DB]
               [-g GENOMES_STORAGE] [--init-gene-coverages]
               [-C COLLECTION_NAME] [-o DIR_PATH] [--list-collections]
               [--taxonomic-level {t_phylum,t_class,t_order,t_family,t_genus,t_species}]
               [--cog-data-dir COG_DATA_DIR] [--quick-summary]
               [--report-aa-seqs-for-gene-calls]

Parameters

PROFILE: The profile. It could be a standard or pan profile database.

  -p PAN_OR_PROFILE_DB, --pan-or-profile-db PAN_OR_PROFILE_DB
                        Anvi'o pan or profile database

PROFILE TYPE SPECIFIC PARAMETERS: If you are summarizing a collection stored in a standard anvi'o profile, you will need a contigs database to go with it. If you are working with a pan profile, then you will need to provide a genomes storage. Don't worry too much, because anvi'o will warn you gently if you make a mistake.

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -g GENOMES_STORAGE, --genomes-storage GENOMES_STORAGE
                        Anvi'o genomes storage file

STANDARD PROFILE SPEFIFIC PARAMS: Parameters that are only relevant to standard profile summaries (declaring or not declaring them will not change anything if you are summarizing a pan profile).

  --init-gene-coverages
                        Initialize gene coverage and detection data. This is a
                        very computationally expensive step, but it is
                        necessary when you need gene level coverage data. The
                        reason this is very computationally expensive is
                        because anvi'o computes gene coverages by going back
                        to actual coverage values of each gene to average
                        them, instead of using contig average coverage values,
                        for extreme accuracy.
  --report-aa-seqs-for-gene-calls
                        You can use this flag if you would like to find
                        translated DNA sequences for your gene calls in the
                        genes output file.

COMMONS: Common parameters for both pan and standard profile summaries.

  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        Collection name.
  -o DIR_PATH, --output-dir DIR_PATH
                        Directory path for output files
  --list-collections    Show available collections and exit.
  --taxonomic-level {t_phylum,t_class,t_order,t_family,t_genus,t_species}
                        The taxonomic level to use. The default is 't_genus'.
                        Only relevant if the anvi'o ontigs database contains
                        taxonomic annotations.
  --cog-data-dir COG_DATA_DIR
                        The directory path for your COG setup. Anvi'o will try
                        to use the default path if you do not specify
                        anything.
  --quick-summary       When declared the summary output will be generated as
                        quickly as possible, with minimum amount of essential
                        information about bins.

anvi-update-db-description

Update the description in an anvi'o database

Usage

anvi-update-db-description [-h] --description TEXT_FILE DB

Parameters

positional arguments:

  DB                    An anvi'o database.

optional arguments:

  --description TEXT_FILE
                        A plain text file that contains some description about
                        the project. You can use Markdwon syntax. The
                        description text will be rendered and shown in all
                        relevant interfaces, including the anvi'o interactive
                        interface, or anvi'o summary outputs.

anvi-script-add-default-collection

A script to add a 'DEFAULT' collection in an anvi'o pan or profile database with a bin named 'EVERYTHING' that describes all items available in the profile databse.

Usage

anvi-script-add-default-collection [-h] -p PAN_OR_PROFILE_DB

Parameters

optional arguments:

  -p PAN_OR_PROFILE_DB, --pan-or-profile-db PAN_OR_PROFILE_DB
                        Anvi'o pan or profile database

anvi-script-checkm-tree-to-interactive

Reformat FASTA file (remove contigs based on length, or based on a given list of deflines, and/or generate an output with simpler names)

Example uses and other resources

Usage

anvi-script-checkm-tree-to-interactive [-h] -t CHECKM TREE -o DIRECTORY

Parameters

optional arguments:

  -t CHECKM TREE, --tree CHECKM TREE
                        Tree file generated by CheckM.
  -o DIRECTORY, --output-dir DIRECTORY
                        The directory name that output files will be stored.

anvi-script-filter-fasta-by-blast

Filter FASTA file according to BLAST table (remove sequences with bad BLAST alignment.

Usage

anvi-script-filter-fasta-by-blast [-h] [-f FASTA] [-o FILE_PATH] -b
                                  BLAST_OUTPUT -s OUTFMT -t THRESHOLD

Parameters

optional arguments:

  -f FASTA, --fasta-file FASTA
                        A FASTA-formatted input file
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.
  -b BLAST_OUTPUT, --blast-output BLAST_OUTPUT
                        BLAST table generated with blastp. `--outfmt 6` as the
                        output format is assumed.
  -s OUTFMT, --outfmt OUTFMT
                        Specify the column ordering of your BLAST report. We
                        add the following paramter to our BLAST searches so
                        the output report contains the `qlen` field, which is
                        not included by default: `-outfmt '6 qseqid sseqid
                        pident length mismatch gapopen qstart qend sstart send
                        evalue bitscore qlen slen'`. You may have used a
                        different `-outfmt` paramter, and you should use this
                        parameter to explicitly define the column names in
                        your output file. For instance, if you had used the
                        parameter mentioned above, then the correct version of
                        this parameter would be: "qseqid sseqid pident length
                        mismatch gapopen qstart qend sstart send evalue
                        bitscore qlen slen". Regardless of the BLAST output
                        format, your columns MUST contain the following
                        parameters for this program to work properly:
                        'qseqid', 'bitscore', 'length', 'qlen', and 'pident'.
  -t THRESHOLD, --threshold THRESHOLD
                        What `proper_pident` threshold do you want to use for
                        filtering out sequences whose top bit-score matches
                        have `proper_pident`s less than this threshold? We
                        have defined `proper_pident` to be the percentage of
                        the query amino acids that both aligned to and were
                        identical to the corresponding matched amino acid.
                        Note that the `pident` parameter output by BLAST does
                        not include regions of the query sequence unaligned to
                        the matched sequence, whereas `proper_pident` does.
                        For example, a sequence that's only half aligned by a
                        match but with 100% identity at matched regions has a
                        `pident` of 100 but a `proper_pident` of 50. The
                        default is 30.0%.

anvi-script-gen-CPR-classifier

Train a classifier for CPR prediction

Usage

anvi-script-gen-CPR-classifier [-h] [-o OUTPUT] matrix

Parameters

positional arguments:

  matrix                TAB-delimited matrix of CPR genome names, classes, and
                        presence absence of single-copy genes. Headers of the
                        first two rows should be "genome", and "class". The
                        rest of the rows shold be single-copy genes.

optional arguments:

  -o OUTPUT, --output OUTPUT
                        Output file name for the classifier.

anvi-script-gen-distribution-of-genes-in-a-bin

Quantify the detection of genes in genomes in metagenomes to identify the environmental core. This is a helper script for anvi'o metapangenomic workflow.

Usage

anvi-script-gen-distribution-of-genes-in-a-bin [-h] -c CONTIGS_DB
                                               [-p PROFILE_DB]
                                               [-C COLLECTION_NAME]
                                               [-b BIN_NAME]
                                               [--min-detection FLOAT]
                                               [--fraction-of-median-coverage FLOAT]

Parameters

optional arguments:

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        Collection name.
  -b BIN_NAME, --bin-id BIN_NAME
                        Bin name you are interested in.
  --min-detection FLOAT
                        For this entire thing to work, the genome you are
                        focusing on should be detected in at least one
                        metagenome. If that is not the case, it would mean
                        that you do not have any sample that represents the
                        niche for this organism (or you do not have enough
                        depth of coverage) to investigate the detection of
                        genes in the environment. By default, this script
                        requires at least '0.5' of the genome to be detected
                        in at least one metagenome. This parameter allows you
                        to change that. 0 would mean no detection test
                        required, 1 would mean the entire genome must be
                        detected.
  --fraction-of-median-coverage FLOAT
                        The value set here will be used to remove a gene if
                        its total coverage across environments is less than
                        the median coverage of all genes multiplied by this
                        value. The default is 0.25, which means, if the median
                        total coverage of all genes across all samples is
                        100X, then, a gene with a total coverage of less than
                        25X across all samples will be assumed not a part of
                        the 'environmental core'.

anvi-script-gen-short-reads

Generate short reads from contigs

Usage

anvi-script-gen-short-reads [-h] [--output-file-path FILE_PATH]
                            CONFIG_FILE

Parameters

positional arguments:

  CONFIG_FILE           Configuration file

optional arguments:

  --output-file-path FILE_PATH
                        Output FASTA file path

anvi-script-gen-vignette

Generate a vignette for anvi'o programs

Usage

anvi-script-gen-vignette [-h] [-p PROGRAM_NAMES_TO_FOCUS]
                         [-o FILE_PATH]

Parameters

optional arguments:

  -p PROGRAM_NAMES_TO_FOCUS, --program-names-to-focus PROGRAM_NAMES_TO_FOCUS
                        Comma-spearated list of program names to focus Mostly
                        for debugging purposes.
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.

anvi-script-gen_stats_for_single_copy_genes.py

A simple script to generate info from search tables

Usage

anvi-script-gen_stats_for_single_copy_genes.py [-h] [--list-sources]
                                               [--source SOURCE]
                                               CONTIGS_DB

Parameters

positional arguments:

  CONTIGS_DB       Contigs database to read from.

optional arguments:

  --list-sources   Show available single-copy gene search results and exit.
  --source SOURCE  Source to focus on. If none declared, all single-copy gene
                   sources are going to be listed.

anvi-script-gene-clusters-to-gene-calls

Export gene caller ids per genome for a given gene cluster bin. For details, See the pangenomic workflow and an example usage at the URL http://merenlab.org/2015/11/14/pangenomics/

Usage

anvi-script-gene-clusters-to-gene-calls [-h] [-p PROFILE_DB]
                                        [-C COLLECTION_NAME]
                                        [--list-collections]
                                        gene_clusters

Parameters

positional arguments:

  gene_clusters         gene-clusters.txt file (one of the key output files of
                        anvi-pan-genome)

optional arguments:

  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        Collection name.
  --list-collections    Show available collections and exit.

anvi-script-get-collection-info

Provides information about each bin in a given collection.

Usage

anvi-script-get-collection-info [-h] -c CONTIGS_DB [-p PROFILE_DB]
                                [-C COLLECTION_NAME]
                                [--list-collections] [-o FILE_PATH]

Parameters

optional arguments:

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        Collection name.
  --list-collections    Show available collections and exit.
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.

anvi-script-get-collections-as-tab-delimited-matrix.py

A simple script to generate info from search tables

Usage

anvi-script-get-collections-as-tab-delimited-matrix.py
[-h] [-p PROFILE_DB] [-c CONTIGS_DB] [-o FILE_PATH]

Parameters

optional arguments:

  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.

anvi-script-get-prot-sequences.py

A simple script to print out sequences for a given list of contigs database protein ids

Usage

anvi-script-get-prot-sequences.py [-h] CONTIGS_DB PROT_IDs

Parameters

positional arguments:

  CONTIGS_DB  Contigs database to read from.
  PROT_IDs    Protein IDs.

anvi-script-itep-to-data-txt

A simple script to convert ITEP output into data.txt

Usage

anvi-script-itep-to-data-txt [-h] [-o OUTPUT_FILE] ITEP_CLUSTERS

Parameters

positional arguments:

  ITEP_CLUSTERS         This is the ITEP output file you should find in
                        "flatclusters" directory

optional arguments:

  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        Where to store the information. The default is
                        "data.txt".

anvi-script-merge-collections

Generate an additional data file from multiple collections.

Usage

anvi-script-merge-collections [-h] -c CONTIGS_DB -i FILES) [FILE(S ...]
                              -o OUTPUT_FILE

Parameters

optional arguments:

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -i FILE(S) [FILE(S) ...], --input-files FILE(S) [FILE(S) ...]
                        Input file(s). TAB-delimited input files should have
                        two columns, where the first column holds the contig
                        name, and the second one the bin id. This is the
                        standard ouptut of the program anvi-export-collection.
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        Output file name.

anvi-script-predict-CPR-genomes

Screen for genomes to find likely members of CPR

Usage

anvi-script-predict-CPR-genomes [-h] -c CONTIGS_DB [-p PROFILE_DB]
                                [-C COLLECTION_NAME]
                                [--list-collections]
                                [--report-only-cpr]
                                [--min-genome-size MIN_GENOME_SIZE]
                                [--min-percent-completion MIN_PERCENT_COMPLETION]
                                [--max-percent-redundancy MAX_PERCENT_REDUNDANCY]
                                [--min-class-probability MIN_CLASS_PROBABILITY]
                                [-o FILE_PATH]
                                classifier_object

Parameters

positional arguments:

  classifier_object     Model output generated by anvi-script-gen-CPR-
                        classifier

optional arguments:

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  -p PROFILE_DB, --profile-db PROFILE_DB
                        Anvi'o profile database
  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        Collection name.
  --list-collections    Show available collections and exit.
  --report-only-cpr     Include only bins that look like CPR genomes.
  --min-genome-size MIN_GENOME_SIZE
                        Minimum genome size to consider for CPR in Mbp.
                        Default is 0.500000
  --min-percent-completion MIN_PERCENT_COMPLETION
                        Minimum percent completion estimate based on anvi'o
                        default single-copy gene collections. Default is 50
  --max-percent-redundancy MAX_PERCENT_REDUNDANCY
                        Maxumum percent redundancy or single-copy genes in an
                        anvi'o bin, or a genome to consider for
                        classification. The default is 30
  --min-class-probability MIN_CLASS_PROBABILITY
                        If the classification confidence is below this don't
                        bother. Default is 75.
  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.

anvi-script-reformat-fasta

Reformat FASTA file (remove contigs based on length, or based on a given list of deflines, and/or generate an output with simpler names)

Usage

anvi-script-reformat-fasta [-h] [-l MIN_LENGTH] [-i TXT FILE]
                           [-I TXT FILE] -o FASTA FILE
                           [--simplify-names] [--prefix PREFIX]
                           [-r REPORT FILE]
                           contigs_fasta

Parameters

positional arguments:

  contigs_fasta

optional arguments:

  -l MIN_LENGTH, --min-len MIN_LENGTH
                        Minimum length of contigs to keep (contigs shorter
                        than this value will not be included in the output
                        file). The default is 0, so nothing will be removed if
                        you do not declare a minimum size.
  -i TXT FILE, --exclude-ids TXT FILE
                        IDs to remove from the FASTA file. You cannot provide
                        both --keep-ids and --exclude-ids.
  -I TXT FILE, --keep-ids TXT FILE
                        If provided, all IDs not in this file will be excluded
                        from the reformatted FASTA file. Any additional
                        filters (such as --min-len) will still be applied to
                        the IDs in this file. You cannot provide both
                        --exclude-ids and --keep-ids.
  -o FASTA FILE, --output-file FASTA FILE
                        Output file path.
  --simplify-names      Edit deflines to make sure they contigs have simple
                        names.
  --prefix PREFIX       Use this parameter if you would like to add a prefix
                        to your contig names while simplifying them. The
                        prefix must be a single word (you can use underscor
                        character, but nothing more!).
  -r REPORT FILE, --report-file REPORT FILE
                        Report file path. When you run this program with
                        `--simplify-names` flag, all changes to deflines will
                        be reported in this file in case you need to go back
                        to this information later. It is not mandatory to
                        declare one, but it is a very good idea to have it.

anvi-script-run-eggnog-mapper

Run eggnog-mapper on a contigs database, and store results

Usage

anvi-script-run-eggnog-mapper [-h] -c CONTIGS_DB
                              [--cog-data-dir COG_DATA_DIR]
                              [-T NUM_THREADS]
                              [--drop-previous-annotations]
                              [--annotation EMAPPER_ANNOTATION_FILE]
                              [--use-version EMAPPER_VERSION]

Parameters

optional arguments:

  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs'
  --cog-data-dir COG_DATA_DIR
                        The directory path for your COG setup if you did not
                        use the default directory.
  -T NUM_THREADS, --num-threads NUM_THREADS
                        Maximum number of threads to use for multithreading
                        whenever possible. Very conservatively, the default is
                        1. It is a good idea to not exceed the number of CPUs
                        / cores on your system. Plus, please be careful with
                        this option if you are running your commands on a SGE
                        --if you are clusterizing your runs, and asking for
                        multiple threads to use, you may deplete your
                        resources very fast.
  --drop-previous-annotations
                        When declared, previous annotations in the database
                        will be dropped.
  --annotation EMAPPER_ANNOTATION_FILE
                        If you have an annotation file from a previous run,
                        you can call this program to import the contents of
                        that file into the database instead of a run from
                        scratch. In that case, you must also use the `--use-
                        version` parameter to clarify which parser version
                        should be used to parse it.
  --use-version EMAPPER_VERSION
                        The version of eggnog-mapper that generated the
                        annotation file.

anvi-script-snvs-to-interactive

Take the output of anvi-gen-variability-profile, prepare an output for interactive interface

Usage

anvi-script-snvs-to-interactive [-h]
                                [--min-departure-from-consensus FLOAT]
                                [--max-departure-from-consensus FLOAT]
                                -o DIR_PATH
                                profile

Parameters

positional arguments:

  profile               The output file generated by anvi-gen-variability-
                        profile

optional arguments:

  --min-departure-from-consensus FLOAT
                        Minimum departure from consensus at a given variable
                        nucleotide position. The default is 0.00.
  --max-departure-from-consensus FLOAT
                        Maximum departure from consensus at a given variable
                        nucleotide position. The default is 0.99.
  -o DIR_PATH, --output-dir DIR_PATH
                        Directory path for output files