Variables/functions¶
Seqtool offers many variables/functions providing information about the sequence records or the results of some commands.
Types of variables/functions¶
The following variable categories are provided:
- General properties of sequence records:
sequence header (
id
,desc
), the sequence (seq
,upper_seq
, ...), input file names/paths (filename
,path
, ...), etc. - Sequence statistics
such as the GC content (
gc_percent
), etc. - Access to key=value attributes
in sequence headers (
attr(name)
, ...). More on attributes here. - Integration of
metadata from delimited text files
(
meta(field)
, ...). More on metadata here. - Some commands provide the results of some calculations in the form of variables/functions (find, unique, sort, split)
Complete reference¶
👉 Full reference of variables/functions provided by all commands (see command documentation for those provided by individual commands).
Use in seqtool commands¶
Variables/functions are usually written in curly braces: {variable}
, although
this is optional in some cases (see below).
Simple example¶
The following command recodes IDs to seq_1
, seq_2
, seq_3
etc.
using the num variable:
Grouping / categorization¶
The sort, unique and count commands use variables/functions for grouping/categorization.
The keys can be single variable/function (without braces) or
composed of text with multiple variables/functions, e.g.: {id}_{desc}
(braces required).
The following command sorts sequences by length:
Setting/editing header attributes¶
Variables/functions are needed for composing header attributes
(-a/--attr
argument):
Ranges (trim/mask)¶
The trim and mask commands accept ranges or even lists of ranges in the form of variables.
In this command, we trim the sequence using start and end coordinates stored in separate attributes:
Or, we just use the range stored as a whole in the sequence header (above example).
The handling multiple ranges is documented in a sequence masking example.
Delimited text output¶
Variables/functions are also used to define the content of delimited text files.
This example searches a sequence ID prefix (everything before a dot .
)
using a regular expression, and returns the matched text as TSV:
out.tsv
As with sort/unique/count keys,
{braces}
are not needed, unless a field is composed mixed text and/or other variables (more details below)
Expressions¶
Expressions can be used everywhere where variables/functions
are allowed. They must always be written in {braces}
(exception: filter expressions).
Example: calculating the fraction of ambiguous bases for each sequence:
Use of braces¶
The braced {variable}
notation is always necessary...
- when setting/composing attributes with
-a/--attr key=value
- if variables/functions are mixed with plain text and/or other other variables
- in set, output paths in split, text replacements in find (
--repl
) - with JavaScript expressions
The braces can optionally be omitted if only a single variable/function is used as...