fixed

Compare literal bytes in the string. This is very fast, but not usually what you want for non-ASCII character sets.

coll

Compare strings respecting standard collation rules.

regex

The default. Uses ICU regular expressions.

boundary

Match boundaries between things.

fixed(pattern, ignore_case = FALSE)

coll(pattern, ignore_case = FALSE, locale = "en", ...)

regex(pattern, ignore_case = FALSE, multiline = FALSE, comments = FALSE,
  dotall = FALSE, ...)

boundary(type = c("character", "line_break", "sentence", "word"),
  skip_word_none = NA, ...)

Arguments

pattern

Pattern to modify behaviour.

ignore_case

Should case differences be ignored in the match?

locale

Locale to use for comparisons. See stri_locale_list() for all possible options. Defaults to "en" (English) to ensure that the default collation is consistent across platforms.

...

Other less frequently used arguments passed on to stri_opts_collator, stri_opts_regex, or stri_opts_brkiter

multiline

If TRUE, $ and ^ match the beginning and end of each line. If FALSE, the default, only match the start and end of the input.

comments

If TRUE, white space and comments beginning with # are ignored. Escape literal spaces with \ .

dotall

If TRUE, . will also match line terminators.

type

Boundary type to detect.

skip_word_none

Ignore "words" that don't contain any characters or numbers - i.e. punctuation. Default NA will skip such "words" only when splitting on word boundaries.

Examples

pattern <- "a.b" strings <- c("abb", "a.b") str_detect(strings, pattern)
#> [1] TRUE TRUE
str_detect(strings, fixed(pattern))
#> [1] FALSE TRUE
str_detect(strings, coll(pattern))
#> [1] FALSE TRUE
# coll() is useful for locale-aware case-insensitive matching i <- c("I", "\u0130", "i") i
#> [1] "I" "İ" "i"
str_detect(i, fixed("i", TRUE))
#> [1] TRUE FALSE TRUE
str_detect(i, coll("i", TRUE))
#> [1] TRUE FALSE TRUE
str_detect(i, coll("i", TRUE, locale = "tr"))
#> [1] FALSE TRUE TRUE
# Word boundaries words <- c("These are some words.") str_count(words, boundary("word"))
#> [1] 4
str_split(words, " ")[[1]]
#> [1] "These" "are" "" "" "some" "words."
str_split(words, boundary("word"))[[1]]
#> [1] "These" "are" "some" "words"
# Regular expression variations str_extract_all("The Cat in the Hat", "[a-z]+")
#> [[1]] #> [1] "he" "at" "in" "the" "at" #>
str_extract_all("The Cat in the Hat", regex("[a-z]+", TRUE))
#> [[1]] #> [1] "The" "Cat" "in" "the" "Hat" #>
str_extract_all("a\nb\nc", "^.")
#> [[1]] #> [1] "a" #>
str_extract_all("a\nb\nc", regex("^.", multiline = TRUE))
#> [[1]] #> [1] "a" "b" "c" #>
str_extract_all("a\nb\nc", "a.")
#> [[1]] #> character(0) #>
str_extract_all("a\nb\nc", regex("a.", dotall = TRUE))
#> [[1]] #> [1] "a\n" #>