Comparison of toolsΒΆ
In the following list, we show the execution time, memory footprint and CPU usage of seqtool v0.4.0-beta on a selection of tasks, compared with the following tools:
Details on the approach are found here. The input file is a FASTQ or FASTA file containing 2.6 M reads (Illumina MiSeq, 300 bp). The comparison was run on a Ryzen 4750U CPU with frequency boost disabled, writing files to a RAM instead of the disk.
The fastest/most memory-efficient commands are highlighted by 'π' and an indication, how many times faster / less memory they use compared to the commands ranking second. To show more details, click on the alternative commands list.
passΒΆ
Do nothing, just read and write FASTA | π 2.6 s π 7.1 MiB π (2.53x) |
|||||||||||||
Convert FASTQ to FASTA |
FASTX-Toolkit π 287.9 s β Seqtk π 4.3 s β SeqKit π 3.1 s
|
π 3.1 s π (1.0x) π 7.1 MiB |
||||||||||||
Convert FASTQ quality scores |
VSEARCH π 12.9 s β SeqKit π 48.8 s
|
π 7.4 s π (1.8x) π 7.0 MiB |
||||||||||||
Write compressed FASTQ files in GZIP format |
SeqKit π 30.3 s π (1.3x) β seqtool | gzip π 159.1 s β gzip directly π 158.6 s β pigz directly (4 threads) π 39.0 s
|
π 55.8 s π 27.5 MiB |
||||||||||||
Write compressed FASTQ files in Zstandard format | π 15.5 s 114% CPU π 11.0 MiB π (3.52x) |
|||||||||||||
Write compressed FASTQ files in Lz4 format | π 9.4 s π (1.1x) 116% CPU π 27.6 MiB |
countΒΆ
Count the number of FASTQ sequences in the input | π 0.6 s π (1.2x) π 7.1 MiB |
||||
Count the number of FASTQ sequences, grouped by GC content (in 10% intervals) |
π¦ outputst with math expression π 7.0 s
|
π 4.2 s π (1.6x) π 7.4 MiB π (11.66x) |
sortΒΆ
uniqueΒΆ
Remove duplicate sequences using sequence hashes. This is more memory efficient and usually faster than keeping the whole sequence around. | π 4.2 s π 117.1 MiB π (1.54x) |
|||||||
Remove duplicate sequences using sequence hashes (case-insensitive). |
VSEARCH π 12.1 s β SeqKit π 6.2 s
|
π 4.3 s π (1.4x) π 117.2 MiB |
||||||
Remove duplicate sequences that are exactly identical (case-insensitive); comparing full sequences instead of not hashes (requires more memory). VSEARCH additionally treats 'T' and 'U' in the same way (seqtool doesn't). |
seqtool (sorted by sequence) π 13.5 s β VSEARCH π 15.8 s
|
π 5.4 s π (2.5x) π 729.0 MiB π (1.85x) |
||||||
Remove duplicate sequences (exact mode) with a memory limit of ~50 MiB | π 19.5 s π 56.6 MiB |
|||||||
Remove duplicate sequences, checking both strands | π 7.5 s π (2.0x) π 117.1 MiB π (2.51x) |
|||||||
Remove duplicate sequences, appending USEARCH/VSEARCH-style abundance annotations to the headers: >id;size=NN |
VSEARCH π 16.1 s
|
π 9.3 s π (1.7x) π 1606.2 MiB |
||||||
De-replicate both by sequence and record ID (the part before the first space in the header). The given benchmark actually has unique sequence IDs, so the result is the same as de-replication by sequence. |
VSEARCH π 17.7 s
|
π 7.5 s π (2.3x) π 1090.6 MiB π (1.25x) |
filterΒΆ
Filter sequences by length |
Seqtk π 6.5 s β SeqKit π 4.1 s π (1.3x)
|
π 5.4 s π 7.2 MiB |
||||||
Filter sequences by the total expected error as calculated from the quality scores |
VSEARCH π 32.9 s β USEARCH π 16.0 s π (1.7x)
|
π 27.9 s π 7.2 MiB |
||||||
Select records from a large set of sequences given a list of 1000 sequence IDs |
VSEARCH π 28.1 s β SeqKit π 1.0 s π (1.6x)
|
π 1.6 s π 7.9 MiB |
sampleΒΆ
Random subsampling to 1000 of sequences |
VSEARCH π 4.3 s β Seqtk π 0.8 s β SeqKit π 11.5 s
|
π 0.5 s π (1.4x) π 7.2 MiB |
|||||||||
Random subsampling to ~10% of sequences | π 0.8 s π (2.2x) π 7.1 MiB |
findΒΆ
Find the forward primer location in the input reads with up to 4 mismatches |
messagesst (4 threads) π 6.0 s π (3.5x) β st (max. mismatches = 2) π 21.1 s β st (max. mismatches = 8) π 26.7 s
|
π 21.3 s π 7.4 MiB π (1.00x) |
|||||||||
Find and trim the forward primer up to an error rate (edit distance) of 20%, discarding unmatched reads. Note: Unlike Cutadapt, seqtool currently does not offer ungapped alignments (--no-indels ).
|
messagesCutadapt π 67.1 s
|
π 16.9 s π (4.0x) 120% CPU π 7.4 MiB π (2.83x) |
|||||||||
Find and trim the forward primer in parallel using 4 threads (cores). |
messagesCutadapt π 18.1 s
|
π 4.9 s π (3.7x) 448% CPU π 17.8 MiB π (2.22x) |
replaceΒΆ
Convert DNA to RNA using the replace command |
st find π 14.3 s β SeqKit π 4.8 s π (2.1x) β FASTX-Toolkit π 283.5 s
|
π 10.1 s π 7.2 MiB |
|||||||||
Convert DNA to RNA using 4 threads | π 2.7 s π (3.1x) 418% CPU π 9.0 MiB π (2.74x) |
trimΒΆ
Trim the leading 99 bp from the sequences |
SeqKit (creates FASTA index) π 44.8 s
|
π 2.8 s π (16.0x) π 7.4 MiB π (170.10x) |
upperΒΆ
Convert sequences to uppercase | π 3.0 s π (1.4x) π 7.4 MiB |
revcompΒΆ
Reverse complement sequences |
Seqtk π 5.3 s π (1.1x) β VSEARCH π 7.7 s β SeqKit π 7.8 s
|
π 6.0 s π 7.2 MiB |
concatΒΆ
Concatenate sequences, adding an NNNNN spacer inbetween
|
π 9.9 s π (2.1x) π 7.4 MiB |