Skip to content

Explanation of ranges

Ranges in seqtool are used or produced by commands like trim, find, mask, and slice.

In a nutshell

  1. Ranges in the form start:end include both the start and end position, unless 0-based coordinates are configured.
  2. Negative coordinates (e.g. -5:-1) indicate coordinate offsets from the end
  3. Unbounded ranges (start: or :end) include everything from start to the sequence end, respectively from the beginning to end. "undefined" equals to missing coordinates.
  4. If interpreting ranges as exclusive, the actual start or end positions are not included in the range.

Overview

Ranges look like this: start:end. The the start and end positions are always part of the range, unless explicitly switching to 0-based coordinates.

It is also possible to use negative numbers: -1 references the last character in the sequence, -2 the second last, and so on.

                   | <—————————————> | 
sequence:       A   T   G   C   A   T   G   C
base number:    1   2   3   4   5   6   7   8
from end:      -8  -7  -6  -5  -4  -3  -2  -1

The following commands all trim sequences to the blue range, resulting in the same output:

st trim '2:6' input.fasta
st trim '-7:-3' input.fasta
st trim '2:-3' input.fasta

Empty ranges

Ranges of zero length are only possible if the start is greater than the end (e.g. 5:4). seqtool interprets all ranges where start > end as empty.

An exception are 0-based ranges. In this specific mode, 5:5 would result in an empty range.

Unbounded ranges: start: or :end

The start or end positions can be missing, which results in the whole sequence up or from a certain position being included in the range.

No end

The following retains all positions from 5 to the end:

st trim '5:' input.fasta
st trim '-4:' input.fasta
                               | <——————————>
sequence:       A   T   G   C   A   T   G   C 
base number:    1   2   3   4   5   6   7   8 
from end:      -8  -7  -6  -5  -4  -3  -2  -1 

The sequence ends at position 8, so 5: is equivalent to 5:8 or 5:-1.

However, if sequence lengths differ, only 5: or 5:-1 will include everything after position 5, while 5:8 would still only return these fixed positions:

ATGCATGC
ATGCATGCMORE

⚠️ 5: is equivalent to 5:-1 here, but results can differ with exclusive ranges. Usually, you might want to use the unbounded start: range, which will always include the whole sequence end.

No start

It is also possible to omit the start position to return all positions up to a given position:

st trim ':3' input.fasta
ATGCATGC
ATGCATGCMORE

⚠️ again, 0:3 is equivalent to :3, but only if not using exclusive ranges.

No bounds at all

The following will retain the whole sequence, resulting in no trimming at all:

st trim ":" input.fasta
ATGCATGC
ATGCATGCMORE

undefined

Undefined is a special keyword that equals to missing data and thus, undefined:undefined equals to an unbounded range :.

undefined may be returned by functions such as opt_attr() and opt_meta().

Exclusive ranges (-e/--exclusive)

The trim and mask commands also accept an -e/--exclusive argument that excludes start and end coordinates from the range.

The following commands trim to positions 3-5 (blue) without the range bounds 2 and 6 themselves (red).

st trim -e '2:6' input.fasta
st trim -e '-7:-3' input.fasta
                       | <——————> |
sequence:       A   T   G   C   A   T   G   C
base number:    1   2   3   4   5   6   7   8
from end:      -8  -7  -6  -5  -4  -3  -2  -1

One important corner case are unbounded ranges. In case of missing bounds, the ranges are not trimmed or masked on that side, the range still extends to the start or end as if it would without -e/--exclusive:

st trim -e '5:' input.fasta
st trim -e '-4:' input.fasta
                                   | <——————>
sequence:       A   T   G   C   A   T   G   C 
base number:    1   2   3   4   5   6   7   8 
from end:      -8  -7  -6  -5  -4  -3  -2  -1 

0-based coordinates (-0)

If you prefer 0-based ranges common to many programming languages, specify the -0 argument. These are less intuitive, but have the advantage that empty slices can be more easily obtained (e.g. st trim -0 1:1).

The range indices start with 0 instead of 1, and the range end (green) is not included in the slice. Negative indices are also possible and work exactly as in Python.

                   | <—————————————> | 
sequence:       A   T   G   C   A   T   G   C
base number:    1   2   3   4   5   6   7   8
0-based start:  0   1   2   3   4   5   6   7
from end:      -8  -7  -6  -5  -4  -3  -2  -1
st trim -0 '1:6' input.fasta
st trim -0 '-7:-2' input.fasta