Modifier functions control the meaning of the pattern argument to
stringr functions:
boundary(): Match boundaries between things.coll(): Compare strings using standard Unicode collation rules.fixed(): Compare literal bytes.regex()(the default): Uses ICU regular expressions.
Usage
fixed(pattern, ignore_case = FALSE)
coll(pattern, ignore_case = FALSE, locale = "en", ...)
regex(
pattern,
ignore_case = FALSE,
multiline = FALSE,
comments = FALSE,
dotall = FALSE,
...
)
boundary(
type = c("character", "line_break", "sentence", "word"),
skip_word_none = NA,
...
)Arguments
- pattern
Pattern to modify behaviour.
- ignore_case
Should case differences be ignored in the match? For
fixed(), this uses a simple algorithm which assumes a one-to-one mapping between upper and lower case letters.- locale
Locale to use for comparisons. See
stringi::stri_locale_list()for all possible options. Defaults to "en" (English) to ensure that default behaviour is consistent across platforms.- ...
Other less frequently used arguments passed on to
stringi::stri_opts_collator(),stringi::stri_opts_regex(), orstringi::stri_opts_brkiter()- multiline
If
TRUE,$and^match the beginning and end of each line. IfFALSE, the default, only match the start and end of the input.- comments
If
TRUE, white space and comments beginning with#are ignored. Escape literal spaces with\\.- dotall
If
TRUE,.will also match line terminators.- type
Boundary type to detect.
characterEvery character is a boundary.
line_breakBoundaries are places where it is acceptable to have a line break in the current locale.
sentenceThe beginnings and ends of sentences are boundaries, using intelligent rules to avoid counting abbreviations (details).
wordThe beginnings and ends of words are boundaries.
- skip_word_none
Ignore "words" that don't contain any characters or numbers - i.e. punctuation. Default
NAwill skip such "words" only when splitting onwordboundaries.
Examples
pattern <- "a.b"
strings <- c("abb", "a.b")
str_detect(strings, pattern)
#> [1] TRUE TRUE
str_detect(strings, fixed(pattern))
#> [1] FALSE TRUE
str_detect(strings, coll(pattern))
#> [1] FALSE TRUE
# coll() is useful for locale-aware case-insensitive matching
i <- c("I", "\u0130", "i")
i
#> [1] "I" "İ" "i"
str_detect(i, fixed("i", TRUE))
#> [1] TRUE FALSE TRUE
str_detect(i, coll("i", TRUE))
#> [1] TRUE FALSE TRUE
str_detect(i, coll("i", TRUE, locale = "tr"))
#> [1] FALSE TRUE TRUE
# Word boundaries
words <- c("These are some words.")
str_count(words, boundary("word"))
#> [1] 4
str_split(words, " ")[[1]]
#> [1] "These" "are" "" "" "some" "words."
str_split(words, boundary("word"))[[1]]
#> [1] "These" "are" "some" "words"
# Regular expression variations
str_extract_all("The Cat in the Hat", "[a-z]+")
#> [[1]]
#> [1] "he" "at" "in" "the" "at"
#>
str_extract_all("The Cat in the Hat", regex("[a-z]+", TRUE))
#> [[1]]
#> [1] "The" "Cat" "in" "the" "Hat"
#>
str_extract_all("a\nb\nc", "^.")
#> [[1]]
#> [1] "a"
#>
str_extract_all("a\nb\nc", regex("^.", multiline = TRUE))
#> [[1]]
#> [1] "a" "b" "c"
#>
str_extract_all("a\nb\nc", "a.")
#> [[1]]
#> character(0)
#>
str_extract_all("a\nb\nc", regex("a.", dotall = TRUE))
#> [[1]]
#> [1] "a\n"
#>
