trim¶
Trim sequences on the left and/or right (single range) or extract and concatenate several ranges
Usage: st trim [OPTIONS] <RANGES> [INPUT]...
Options:
-h, --help Print help
'Trim' command options:
-e, --exclusive Exclusive trim range: excludes start and end positions from
the output sequence. In the case of unbounded ranges
(`start:` or `:end`), the range still extends to the
complete end or the start of the sequence
-0, --zero-based Interpret range as 0-based, with the end not included
<RANGES> Range(s) in the form 'start:end' or 'start:' or ':end', Multiple
ranges can be supplied as comma-delimited list:
'start:end,start2:end2', etc. The start/end positions can be
defined by variables/functions (start_var:end_var), or
variables/functions may return the whole range (e.g. stored as
header attribute 'attr(range)'), or even a list of ranges (e.g.
'attr(range_list)'). *Note* that with the FASTA format, multiple
trim ranges must be in order (from left to right) and cannot
overlap
The trim ranges always include the start and end coordinates unless -0
is
specified.
Coordinates can be negative to indicate an offset from the end.
See explanation of ranges for more details.
Example: primer trimming¶
Assuming the length of a primer is 20bp, we can remove it like this:
However, primers may sometimes be incomplete or shifted, which is why tools such as Cutadapt are usually used to trim primers. Seqtool can do primer matching using the find command, followed by primer trimming with trim. The following example trims forward and reverse primers, storing the positions as attributes in sequence headers:
st find FWDPRIMER -f -R 0.1 -a fwd_end={match_end} input.fasta |
st find REVPRIMER -f -R 0.1 -a rev_start={match_start} |
st trim --exclusive '{attr(fwd_end)}:{attr(rev_start)}' > primer_trimmed.fasta
The intermediate output before the trim command may look like this:
>id f_end=15 r_start=24 BEFOREFWDPRIMERSEQUENCEREVPRIMERAFTER
primer_trimmed.fasta
:
>id f_end=15 r_start=-15
SEQUENCE
Note:
-e/--exclusive
excludes the last base of the forward primer and the first base of the reverse primer.
Using coordinates from BED files¶
The following is equivalent to
bedtools getfasta
(note that the BED format is 0-based, thus the -0
option):
Instead of -0
we could also use an expression to calculate start + 1
,
matching the standard range coordinate system used by seqtool.
Multiple ranges¶
The trim command can also concatenate multiple parts of the sequence:
More¶
This page lists examples with execution times compared to other tools.