filter¶
Keep/exclude sequences based on different properties with a mathematical (JavaScript) expression
Usage: st filter [OPTIONS] <EXPRESSION> [INPUT]...
Options:
  -h, --help  Print help
'Filter' command options:
  -d, --dropped <FILE>  Output file for sequences that were removed by
                        filtering. The format is auto-recognized from the
                        extension
  <EXPRESSION>          Filter expression
Examples¶
Removing sequences shorter than 100 bp:
Removing DNA sequences with more than 10% of ambiguous bases:
Quick and easy way to select certain sequences:
st filter "id == 'id1' " input.fasta > filtered.fasta
st filter "['id1', 'id2', 'id3'].contains(id)" input.fasta > filtered.fasta
Note: this may not be the most efficient way, consider a text file with an ID list
Quality filtering¶
The exp_err statistics variable
represents the total expected number of errors
in a sequence, as provided by the quality scores.
By default,
the Sanger / Illumina 1.8+ format
(with ASCII offset 33) is assumed.
See here for more information.
This example removes sequences with less than one expected error. The
output is the same as for fastq_filter if 
USEARCH
or VSEARCH.
Normalization according to sequence length is easily possible with
a math formula (corresponding to -fastq_maxee_rate in USEARCH).
More¶
This page lists examples with execution times compared to other tools.