Skip to content

Variables/functions

Seqtool offers many variables/functions providing information about the sequence records or the results of some commands.

Types of variables/functions

The following variable categories are provided:

Complete reference

👉 Full reference of variables/functions provided by all commands (see command documentation for those provided by individual commands).

Use in seqtool commands

Variables/functions are usually written in curly braces: {variable}, although this is optional in some cases (see below).

Simple example

The following command recodes IDs to seq_1, seq_2, seq_3 etc. using the num variable:

st set -i seq_{num} seqs.fasta > renamed.fasta

Grouping / categorization

The sort, unique and count commands use variables/functions for grouping/categorization.

The keys can be single variable/function (without braces) or composed of text with multiple variables/functions, e.g.: {id}_{desc} (braces required).

The following command sorts sequences by length:

st sort seqlen input.fasta > length_sorted.fasta

Setting/editing header attributes

Variables/functions are needed for composing header attributes (-a/--attr argument):

st find PATTERN input.fasta -a rng='{match_range}' > with_range.fasta
>id1 rng=3:10
SEQUENCE
>id2 rng=5:12
SEQUENCE
(...)

Ranges (trim/mask)

The trim and mask commands accept ranges or even lists of ranges in the form of variables.

In this command, we trim the sequence using start and end coordinates stored in separate attributes:

>id1 start=3 end=10
SEQUENCE
(...)
st trim -e 'attr(start):attr(end)' input.fasta > trimmed.fasta

Or, we just use the range stored as a whole in the sequence header (above example).

st trim -e 'attr(rng)' input.fasta > trimmed.fasta

The handling multiple ranges is documented in a sequence masking example.

Delimited text output

Variables/functions are also used to define the content of delimited text files.

This example searches a sequence ID prefix (everything before a dot .) using a regular expression, and returns the matched text as TSV:

st find -ir '[^.]+' seqs.fasta --to-tsv 'id,match,seq' > out.tsv

out.tsv

seq1.suffix123  seq1    SEQUENCE`
seq2.suffix_abc seq2    SEQUENCE`
...

As with sort/unique/count keys, {braces} are not needed, unless a field is composed mixed text and/or other variables (more details below)

Expressions

Expressions can be used everywhere where variables/functions are allowed. They must always be written in {braces} (exception: filter expressions).

Example: calculating the fraction of ambiguous bases for each sequence:

st stat '{ 1 - charcount("ATGC")/seqlen }'
id1 1
id2 0.99
id3 0.95
id4 1
...

Use of braces

The braced {variable} notation is always necessary...

  • when setting/composing attributes with -a/--attr key=value
  • if variables/functions are mixed with plain text and/or other other variables
  • in set, output paths in split, text replacements in find (--repl)
  • with JavaScript expressions

The braces can optionally be omitted if only a single variable/function is used as...

  • sort, unique and count key, e.g.: st sort seq input.fasta
  • range bound in trim, mask, e.g.: st trim 'attr(start):' input.fasta
  • delimited text field, e.g.: st pass input.fasta --to-tsv id,desc,seq