SYNOPSIS metacache modify ... [OPTION]... metacache modify [OPTION]... ... DESCRIPTION Add reference sequence and/or taxonomic information to an existing database. REQUIRED PARAMETERS database file name; A MetaCache database contains taxonomic information and min-hash signatures of reference sequences (complete genomes, scaffolds, contigs, ...). ... FASTA or FASTQ files containing genomic sequences (complete genomes, scaffolds, contigs, ...) that shall beused as representatives of an organism/taxon. If directory names are given, they will be searched for sequence files (at most 10 levels deep). BASIC OPTIONS -taxonomy directory with taxonomic hierarchy data (see NCBI's taxonomic data files) -taxpostmap Files with sequence to taxon id mappings that are used as alternative source in a post processing step. default: 'nucl_(gb|wgs|est|gss).accession2taxid' -silent|-verbose information level during build: silent => none / verbose => most detailed default: neither => only errors/important info ADVANCED OPTIONS -reset-taxa Attempts to re-rank all sequences after the main build phase using '.accession2taxid' files. This will reset the taxon id of a reference sequence even if a taxon id could be obtained from other sources during the build phase. default: off -max-locations-per-feature <#> maximum number of reference sequence locations to be stored per feature; If the value is too high it will significantly impact querying speed. Note that an upper hard limit is always imposed by the data type used for the hash table bucket size (set with compilation macro '-DMC_LOCATION_LIST_SIZE_TYPE'). default: 254 -remove-overpopulated-features Removes all features that have reached the maximum allowed amount of locations per feature. This can improve querying speed and can be used to remove non-discriminative features. default: off Not available in the GPU version. -remove-ambig-features Removes all features that have more distinct reference sequence on the given taxonomic rank than set by '-max-ambig-per-feature'. This can decrease the database size significantly at the expense of sensitivity. Note that the lower the given taxonomic rank is, the more pronounced the effect will be. Valid values: sequence, form, variety, subspecies, species, subgenus, genus, subtribe, tribe, subfamily, family, suborder, order, subclass, class, subphylum, phylum, subkingdom, kingdom, domain default: off Not available in the GPU version. -max-ambig-per-feature <#> Maximum number of allowed different reference sequence taxa per feature if option '-remove-ambig-features' is used. Not available in the GPU version. -max-load-fac maximum hash table load factor; This can be used to trade off larger memory consumption for speed and vice versa. A lower load factor will improve speed, a larger one will improve memory efficiency. default: 0.800000 Not available in the GPU version. EXAMPLES Add reference sequence 'penicillium.fna' to database 'fungi' metacache modify fungi penicillium.fna Add taxonomic information from NCBI to database 'myBacteria' download_ncbi_taxonomy myTaxo metacache modify myBacteria -taxonomy myTaxo