Commandline options

Options recognized by all commands¶

General options (common to all commands):
  -v, --verbose    Print more detailed information about the progress and
                   results of certain commands
  -q, --quiet      Suppress all messages except errors and important warnings
      --help-vars  List and explain all variables/functions available

Input (common to all commands):
      --fmt <FMT>        Input format, only needed if it cannot be guessed from
                         the extension (e.g. if reading from STDIN). 'fasta' is
                         assumed as default (can be configured with ST_FORMAT).
                         Possible choices: fasta (default), fastq
                         (fastq-illumina, fastq-solexa), csv or tsv.
                         Compression: <format>.<compression> (.gz, .bz2 or .lz4)
                         [env: ST_FORMAT=]
      --fa               FASTA input. Short for '--fmt fasta'
      --fq               FASTQ input. Short for '--fmt fastq'
      --fq-illumina      FASTQ input in legacy Illumina 1.3-1.7 format (alias to
                         --fmt fastq-illumina)
      --fields <FIELDS>  Delimited text fields: 'id,seq,desc' (in order) or
                         'id:2,desc:6,seq:9' (col. num.) or
                         'id:ID,seq:Sequence,desc:Comment' (names in header)
                         [default: 'id,seq,desc']
      --delim <CHAR>     TSV/CSV delimiter. Defaults: '\t' for tsv/txt and ','
                         for csv
      --header           Specify if CSV file has a header. Auto-enabled if a
                         'field:column name' mapping is provided with --fields,
                         --csv or --tsv
      --csv <FIELDS>     CSV input. Short for '--fmt csv --fields <fields>'
      --tsv <FIELDS>     TSV input. Short for '--fmt tsv --fields <fields>'
      --qual <FILE>      Path to QUAL file with quality scores (Roche 454 style)
      --seqtype <TYPE>   Sequence type; relevant for the `find` and `revcomp`
                         commands, as well as the variables/functions
                         `seq_revcomp`, `seqhash_rev` and `seqhash_both`
                         (default: auto-detected based on the first sequence)
                         [possible values: dna, rna, protein, other]
  [INPUT]...             Input file(s), multiple possible (use '-' for STDIN)
                         [default: -]

Output (common to all commands):
  -o, --output <FILE>       Write output to <file> instead of STDOUT [Default:
                            STDOUT (-)]
      --append              Append sequences to the end if the output file(s)
                            already exist instead of replacing the content. In
                            case writing to standard output (which is the
                            default if `-o/--output` is not specified), this
                            option has no effect
      --to <FORMAT>         Output format and compression. See --fmt. Only
                            needed if not guessed from the extension (default:
                            input format)
      --wrap <WIDTH>        Wrap FASTA sequences to maximum <width> characters
      --out-delim <DELIM>   TSV/CSV delimiter. Defaults: '\t' for tsv/txt and
                            ',' for csv
      --outfields <FIELDS>  Comma delimited list of CSV/TSV fields, which can be
                            variables/functions or contain
                            {variables}/{expressions}. [default: input fields or
                            'id,desc,seq']
      --to-fa               FASTA output. Short for: '--to fasta'
      --to-fq               FASTQ output. Short for: '--to fastq'
      --to-csv <FIELDS>     CSV output with comma delimited list of fields,
                            which can be variables/functions or contain
                            variables/expressions. Short for '--to csv
                            --outfields <f>'
      --to-tsv <FIELDS>     TSV output with comma delimited list of fields,
                            which can be variables/functions or contain
                            variables/expressions. Short for '--to tsv
                            --outfields <f>'
      --compr-level <L>     Level for compressed output. 1-9 for GZIP/BZIP2
                            (default=6) and 1-16 for LZ4 (default=0). 1-22 for
                            Zstandard (default=3 or 0)
      --qual-out <FILE>     Path to QUAL output file with quality scores

FASTA/Q header attributes (all commands):
  -a, --attr <KEY=VALUE>   Add an attribute in the form name=value to
                           FASTA/FASTQ headers or replace their value if the
                           given name already exists (multiple -a key=value
                           arguments possible). The default output format is:
                           '>id some description key1=value1 key2=value2'. Use
                           --attr-format to change
  -A, --attr-append <K=V>  Append one or multiple attributes in the form
                           name=value to FASTA/FASTQ headers. Compared to
                           `-a/--attr`, existing attributes in headers are NOT
                           replaced. This will result in a duplicate entry if
                           the given attribute name already exists
      --attr-fmt <FMT>     Expected format of sequence header attributes, which
                           is also used for writing new attributes to headers
                           (using -a/--attr). The words 'key' and 'value' must
                           always be present, and 'value' must follow after
                           'key'. Example: ';key=value'. If the delimiter before
                           the key is not a space attributes are appended to the
                           ID (part before the first space) instead of the end
                           of the header [env: ST_ATTR_FORMAT=] [default: "
                           key=value"]

Associated metadata (all commands):
  -m, --meta <FILE>        Delimited text file path (or '-' for STDIN)
                           containing associated metadata, accessed using the
                           `meta(field)` function, or `meta(field, file-num)` in
                           case of multiple metadata files (supplied like this:
                           -m file1 -m file2 ...)
      --meta-delim <CHAR>  Metadata column delimiter. Inferred from the file
                           extension if possible: '.csv' is interpreted as
                           comma(,)-delimited, '.tsv'/'.txt' or other (unknown)
                           extensions are assumed to be tab-delimited [default:
                           "\t"]
      --meta-header        Specify if the first row of the metadata file(s)
                           contains column names. Automatically enabled if a
                           non-numeric field names are used, e.g.
                           'meta(fieldname)'
      --meta-idcol <NUM>   Column number containing the sequence record IDs
                           [default: 1]
      --dup-ids            Specify if the sequence input is expected to contain
                           duplicate IDs. Without this flag, there may be an
                           error (`meta` and `has_meta` functions), whereas
                           `opt_meta` may wrongly return missing values

Expressions/scripts (all commands):
      --js-init <CODE>  Javascript code to execute during initialization (e.g.
                        for defining global variables used later during
                        parsing). Either a plain string or
                        'file:path/to/file.js'

Advanced (all commands):
      --max-read-mem <SIZE>  Buffer size limit for the internal FASTA/FASTQ
                             reader. Larger sequence records will cause an
                             error. Note, that some commands such as 'sort',
                             'unique' and 'sample' still use more memory and
                             have their own additional memory limit setting.
                             Either a plain number (bytes) a number with unit
                             (K, M, G, T) based on powers of 2 [default: 1G]
  -T, --read-thread          Read from a different thread. Enabled with
                             compressed input
  -W, --write-thread         Write in a different thread. Enabled with
                             compressed output