split¶
Distribute sequences into multiple files based on a variable/function or advanced expression
In contrast to other commands, the output argument (-o/--output
) of the
'split' command can contain variables and advanced expressions to determine the
file path for each sequence. However, the output format will not be
automatically
determined from file extensions containing variables.
Usage: st split [OPTIONS] [INPUT]...
Options:
-h, --help Print help
'Split' command options:
-n, --num-seqs <N> Split into chunks of <N> sequences and writes each
chunk to a separate file with a numbered suffix. The
output path is: '{filestem}_{chunk}.{default_ext}',
e.g. 'input_name_1.fasta'. Change with `-o/--output`
-p, --parents Automatically create all parent directories of the
output path
-c, --counts <COUNTS> Write a tab-separated list of file path + record count
to the given file (or STDOUT if `-` is specified)
Immagine this FASTA file (input.fa
):
This will create the files group_1.fa
and group_2.fa
. In more
complicated scenarios, variables may be combined for creating nested subfolders
of any complexity.
An example of de-multiplexing sequences by forward primer is found in the documetation of the find command.
Variables available in the split command¶
Example¶
Split input into chunks of 1000 sequences, which will be named outdir/file_1.fq, outdir/file_2.fq, etc.: