split¶
Distribute sequences into multiple files based on a variable/function or advanced expression
or advanced expressions specified in the output path (-o/--output
).
See --help
and --help-vars
for more information.
In contrast to other commands, the output argument (-o
) of the
'split' command can contain variables/functions to determine the
file path for each sequence.
Usage: st split [OPTIONS] [INPUT]...
Options:
-h, --help Print help
'Split' command options:
-n, --num-seqs <N> Split into chunks of <N> sequences and writes each
chunk to a separate file with a numbered suffix. The
output path is: '{filestem}_{chunk}.{default_ext}',
e.g. 'input_name_1.fasta'. Change with `-o/--output`
-p, --parents Automatically create all parent directories of the
output path
-c, --counts <COUNTS> Write a tab-separated list of file path + record count
to the given file (or STDOUT if `-` is specified)
Immagine this FASTA file (input.fa
):
This will create the files group_1.fa
and group_2.fa
. In more
complicated scenarios, variables may be combined for creating nested subfolders
of any complexity.
An example of de-multiplexing sequences by forward primer is found in the documetation of the find command.
Variables available in the split command¶
Example¶
Split input into chunks of 1000 sequences, which will be named outdir/file_1.fq, outdir/file_2.fq, etc.: