Metadata from delimited files¶
In all seqtool commands, it is possible to integrate external metadata from delimited text files created manually or using another program.
Files are specified using the -m/--meta
option and accessed using the functions
meta(column)
, opt_meta(column)
(with missing data) or has_meta(column)
(to check if the metadata is present).
Column is either a number or the header name of the given column.
See also variable reference and detailed description of command-line options
By default, files are assumed to be tab-delimited, and the
first column should contain the ID.
However, this can be changed with --meta-delim
and --id-col
.
Examples¶
Consider this list containing taxonomic information about sequences (genus.tsv):
The genus name can be added to the FASTA header using this command:
st set --meta genus.tsv --desc '{meta(genus)}' input.fasta > with_genus.fasta
# short:
st set -m genus.tsv -d '{meta(genus)}' input.fasta > with_genus.fasta
If any of the sequence IDs is not found in the metadata, there will be an error.
If missing data is expected, use opt_meta
instead.
Missing entries are undefined
:
Filtering by ID¶
Sometimes it is necessary to select all sequence records present in a list of sequence IDs. This can easily be achieved using this command:
Multiple metadata sources¶
Several sources can be simultaneously used in the same command with
-m file1 -m file2 -m file3...
:
st filter -m source1.txt -m source2.txt 'meta("column", 1) == "value" && has_meta(2)' seqs.fasta > in_list.fasta
Sources are referenced using
meta(column, file_number)
orhas_meta(file_number)
; see also variable reference