R bindings to Sassy through R’s native C API. Results are returned as data frames with 0-based coordinates and CIGAR strings.
Install from r-universe:
install.packages(
"Rsassy",
repos = c("https://sounkou-bioinfo.r-universe.dev", "https://cloud.r-project.org")
)Source installs require Cargo/rustc >= 1.91 and xz. Rust crates are
vendored in src/rust/vendor.tar.xz for offline package builds. On
Linux, macOS, and Windows, Rsassy installs multiple backend libraries
when possible: scalar, AVX2, and AVX-512 on x86_64; scalar and NEON on
arm64. The webR/WebAssembly build uses wasm SIMD128; see the browser
demo. Rsassy selects the
best installed backend supported by the current CPU/runtime when the
backend is first loaded.
library(Rsassy)
sassy_search(list("ATCGATCG"), list("GGGGATCGATCGTTTT"), k = 1, alphabet = "dna")
#> <sassy_matches> 3 matches
#> pattern_idx text_idx text_start text_end pattern_start pattern_end cost strand cigar
#> 0 0 2 10 0 8 1 - 7=1X
#> 0 0 4 12 0 8 0 + 8=
#> 0 0 6 14 0 8 1 - 1=1X6=The result is a sassy_matches data frame with pattern_idx,
text_idx, text_start, text_end, pattern_start, pattern_end,
cost, strand, and cigar. Coordinates are 0-based and half-open.
Set match_region = TRUE when you also want the matched sequence. For
strand == "-", match_region is reverse-complemented so it is in the
same direction as the input pattern and CIGAR.
region_matches <- sassy_search(
list("ATCGATCG"),
list("GGGGATCGATCGTTTT"),
k = 1,
alphabet = "dna",
match_region = TRUE
)
region_matches
#> <sassy_matches> 3 matches
#> pattern_idx text_idx text_start text_end pattern_start pattern_end cost strand cigar match_region
#> 0 0 2 10 0 8 1 - 7=1X ATCGATCC
#> 0 0 4 12 0 8 0 + 8= ATCGATCG
#> 0 0 6 14 0 8 1 - 1=1X6= AACGATCGThe print method can color match_region with simple ANSI escape
sequences, following the upstream
Sassy CLI sassy grep
alignment legend: green for matching characters, orange for mismatches,
blue for inserted text characters, and red gaps for pattern characters
absent from the text. Coloring is off by default and is meant for
ANSI-capable interactive terminals.
Reuse a searcher when making repeated calls:
searcher <- sassy_searcher("dna")
sassy_searcher_search(searcher, list("ATCGATCG"), list("GGGGATCGATCGTTTT"), k = 1)
#> <sassy_matches> 3 matches
#> pattern_idx text_idx text_start text_end pattern_start pattern_end cost strand cigar
#> 0 0 2 10 0 8 1 - 7=1X
#> 0 0 4 12 0 8 0 + 8=
#> 0 0 6 14 0 8 1 - 1=1X6=List inputs search every pattern against every text. Each element can be
a raw vector or a character scalar, which also leaves room for
ALTREP-backed batches as lists. For larger batches, use threads > 1.
sassy_search(
list("ATG", "TTT"),
list("CCCCATGCCCCTTT"),
k = 1,
alphabet = "iupac",
rc = FALSE,
strategy = "encoded_patterns"
)
#> <sassy_matches> 2 matches
#> pattern_idx text_idx text_start text_end pattern_start pattern_end cost strand cigar
#> 0 0 4 7 0 3 0 + 3=
#> 1 0 11 14 0 3 0 + 3=strategy = "encoded_patterns" (alias "v2") is the R equivalent of
CLI --v2 for many equal-length short patterns. batch_patterns and
encoded_patterns use
Sassy’s multi-pattern
encoding, which in sassy 0.2.1 is implemented for IUPAC and equal
byte-length patterns. The default strategy = "pairwise" is the general
path for other alphabets and mixed pattern lengths.
CLI-compatible orientation is available with sam = TRUE. This formats
reverse-strand match_region and cigar in text direction, matching
upstream Sassy
sassy --sam output.
sassy_search(
list("ACGA"),
list("TTTCGTTT"),
k = 0,
alphabet = "dna",
match_region = TRUE,
sam = TRUE
)
#> <sassy_matches> 1 match
#> pattern_idx text_idx text_start text_end pattern_start pattern_end cost strand cigar match_region
#> 0 0 2 6 0 4 0 - 4= TCGTChunked FASTA/FASTQ iteration is available with sassy_fastx_iter() and
sassy_fastx_next(). Batches expose record IDs as an ALTREP character
vector and sequences as a list of raw ALTREP slices over immutable
native buffers, so search can consume file records without first
materializing sequence strings in R.
fq <- tempfile(fileext = ".fastq")
writeLines(c("@r1", "ACGT", "+", "!!!!"), fq, useBytes = TRUE)
it <- sassy_fastx_iter(fq, batch_records = 1)
batch <- sassy_fastx_next(it)
sassy_search(list("ACG"), batch$seq, k = 0, alphabet = "dna", rc = FALSE, text_id = batch$id)
#> <sassy_matches> 1 match
#> pattern_idx text_idx text_id text_start text_end pattern_start pattern_end cost strand cigar
#> 0 0 r1 0 3 0 3 0 + 3=CRISPR guide search is available for in-memory sequences with
sassy_crispr(). Guides include the PAM suffix; by default the PAM must
match exactly under IUPAC matching.
sassy_crispr(list("ACGTNGG"), list("TTTACGTAGGTTT"), k = 0, rc = FALSE)
#> guide cost strand start end match_region cigar
#> 1 ACGTNGG 0 + 3 10 ACGTAGG 7=For file-oriented colored grep, FASTA/FASTQ filtering, and large
command-line pipelines, use the upstream
Sassy CLI directly.
Inspect the installed build:
sassy_features()
#> <sassy_features>
#> dispatch: dynamic
#> selected backend: avx2
#> installed backends: scalar, avx2, avx512
#> supported backends: scalar, avx2
#> CPU: avx2=yes avx512f=no neon=no
#> Rust backend: avx2 (native_simd=yes)Backend loading is one-shot per R process. If you need to benchmark or
debug a specific backend, call sassy_set_backend() before the first
native Rsassy call in a fresh Rscript process. See
vignette("backend-selection", package = "Rsassy") for the details.
Common development commands from the repository root:
make vendor-rust # refresh src/rust/vendor.tar.xz after Rust dependency changes
make rd # regenerate NAMESPACE and man/*.Rd from R/search.R
make readme # regenerate README.md from README.Rmd
make install # install the package locally
make test # run tinytest tests
make check # build and run R CMD check
make reports # render committed conformance/performance markdown reports
make clean # remove generated build artifactsmake check uses a CRAN-safe default of two Cargo build jobs. Use
make check-fast or make CARGO_JOBS=10 check for local multithreaded
Cargo builds.