SYNOPSIS metacache merge ... -taxonomy [OPTION]... metacache merge -taxonomy [OPTION]... ... DESCRIPTION This mode classifies reads by merging the results of multiple, independent queries. These might have been obtained by querying one database with different parameters or by querying different databases with different reference sequences or build options. IMPORTANT: In order to be mergable, independent queries need to be run with options: -tophits -queryids -lowest species and must NOT be run with options that suppress or alter default output like, e.g.: -no-map, -no-summary, -separator, etc. Possible Use Case: If your system has not enough memory for one large database, you can split up the set of reference genomes into several databases and query these in succession. The results of these independent query runs can then be merged to obtain a classification based on the whole set of genomes. REQUIRED PARAMETERS ... MetaCache result files. If directory names are given, they will be searched for sequence files (at most 10 levels deep). IMPORTANT: Result files must have been produced with: -tophits -queryids -lowest species and must NOT be run with options that suppress or alter the default output like, e.g.: -no-map, -no-summary, -separator, etc. -taxonomy directory with taxonomic hierarchy data (see NCBI's taxonomic data files) MERGING RESULTS OUTPUT Redirect output to file . If not specified, output will be written to stdout. CLASSIFICATION -lowest Do not classify on ranks below (Valid values: sequence, form, variety, subspecies, species, subgenus, genus, subtribe, tribe, subfamily, family, suborder, order, subclass, class, subphylum, phylum, subkingdom, kingdom, domain) default: sequence -highest Do not classify on ranks above (Valid values: sequence, form, variety, subspecies, species, subgenus, genus, subtribe, tribe, subfamily, family, suborder, order, subclass, class, subphylum, phylum, subkingdom, kingdom, domain) default: domain -hitmin Sets classification threshhold to . A read will not be classified if less than t features from the database match. Higher values will increase precision at the expense of sensitivity. default: 0 -hitdiff Sets classification threshhold to . A read will not be classified if less than t features from the database match. Higher values will increase precision at the expense of sensitivity. default: 0 -maxcand <#> maximum number of reference taxon candidates to consider for each query; A large value can significantly decrease the querying speed!. default: 2 -cov-percentile

Remove the p-th percentile of hit reference sequences with the lowest coverage. Classification is done using only the remaining reference sequences. This can help to reduce false positives, especially whenyour input data has a high sequencing coverage. This feature decreases the querying speed! default: off GENERAL OUTPUT FORMATTING -silent|-verbose information level during build: silent => none / verbose => most detailed default: neither => only errors/important info -no-summary Dont't show result summary & mapping statistics at the end of the mapping output default: off -no-query-params Don't show query settings at the beginning of the mapping output default: off -no-err Suppress all error messages. default: off CLASSIFICATION RESULT FORMATTING -no-map Don't report classification for each individual query sequence; show summaries only (useful for quick tests). default: off -mapped-only Don't list unclassified reads/read pairs. default: off -taxids Print taxon ids in addition to taxon names. default: off -taxids-only Print taxon ids instead of taxon names. default: off -omit-ranks Do not print taxon rank names. default: off -separate-cols Prints *all* mapping information (rank, taxon name, taxon ids) in separate columns (see option '-separator'). default: off -separator Sets string that separates output columns. default: '\t|\t' -comment Sets string that precedes comment (non-mapping) lines. default: '# ' -queryids Show a unique id for each query. Note that in paired-end mode a query is a pair of two read sequences. This option will always be activated if option '-hits-per-ref' is given. default: off -lineage Report complete lineage for per-read classification starting with the lowest rank found/allowed and ending with the highest rank allowed. See also options '-lowest' and '-highest'. default: off ANALYSIS ANALYSIS: ABUNDANCES -abundances Show absolute and relative abundance of each taxon. If a valid filename is given, the list will be written to this file. default: off -abundance-per Show absolute and relative abundances for each taxon on one specific rank. Classifications on higher ranks will be estimated by distributing them down according to the relative abundances of classifications on or below the given rank. (Valid values: sequence, form, variety, subspecies, species, subgenus, genus, subtribe, tribe, subfamily, family, suborder, order, subclass, class, subphylum, phylum, subkingdom, kingdom, domain) If '-abundances ' was given, this list will be printed to the same file. default: off ANALYSIS: RAW DATABASE HITS -tophits For each query, print top feature hits in database. default: off -allhits For each query, print all feature hits in database. default: off -locations Show locations in candidate reference sequences. Activates option '-tophits'. default: off -hits-per-ref Shows a list of all hits for each reference sequence. If this condensed list is all you need, you should deactive the per-read mapping output with '-no-map'. If a valid filename is given after '-hits-per-ref', the list will be written to a separate file. Option '-queryids' will be activated and the lowest classification rank will be set to 'sequence'. default: off ANALYSIS: ALIGNMENTS -align Show semi-global alignment to best candidate reference sequence. Original files of reference sequences must be available. This feature decreases the querying speed! default: off ADVANCED: GROUND TRUTH BASED EVALUATION -ground-truth Report correct query taxa if known. Queries need to have either a 'taxid|' entry in their header or a sequence id that is also present in the database. This feature decreases the querying speed! default: off -precision Report precision & sensitivity by comparing query taxa (ground truth) and mapped taxa. Queries need to have either a 'taxid|' entry in their header or a sequence id that is also found in the database. This feature decreases the querying speed! default: off -taxon-coverage Report true/false positives and true/false negatives.This option turns on '-precision', so ground truth data needs to be available. This feature decreases the querying speed! default: off ADVANCED: CUSTOM QUERY SKETCHING (SUBSAMPLING) -kmerlen number of nucleotides/characters in a k-mer default: determined by database -sketchlen number of features (k-mer hashes) per sampling window default: determined by database -winlen number of letters in each sampling window default: determined by database -winstride distance between window starting positions default: determined by database ADVANCED: PERFORMANCE TUNING / TESTING -threads <#> Sets the maximum number of parallel threads to use.default (on this machine): 8 -batch-size <#> Process <#> many queries (reads or read pairs) per thread at once. default (on this machine): 4096 -query-limit <#> Classify at max. <#> queries (reads or read pairs) per input file. default: 9223372036854775807