str_extract() extracts the first complete match from each string,
str_extract_all()extracts all matches from each string.
Arguments
- string
Input vector. Either a character vector, or something coercible to one.
- pattern
Pattern to look for.
The default interpretation is a regular expression, as described in
vignette("regular-expressions"). Useregex()for finer control of the matching behaviour.Match a fixed string (i.e. by comparing only bytes), using
fixed(). This is fast, but approximate. Generally, for matching human text, you'll wantcoll()which respects character matching rules for the specified locale.Match character, word, line and sentence boundaries with
boundary(). An empty pattern, "", is equivalent toboundary("character").- group
If supplied, instead of returning the complete match, will return the matched text from the specified capturing group.
- simplify
A boolean.
FALSE(the default): returns a list of character vectors.TRUE: returns a character matrix.
Value
str_extract(): an character vector the same length asstring/pattern.str_extract_all(): a list of character vectors the same length asstring/pattern.
See also
str_match() to extract matched groups;
stringi::stri_extract() for the underlying implementation.
Examples
shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")
str_extract(shopping_list, "\\d")
#> [1] "4" NA NA "2"
str_extract(shopping_list, "[a-z]+")
#> [1] "apples" "bag" "bag" "milk"
str_extract(shopping_list, "[a-z]{1,4}")
#> [1] "appl" "bag" "bag" "milk"
str_extract(shopping_list, "\\b[a-z]{1,4}\\b")
#> [1] NA "bag" "bag" "milk"
str_extract(shopping_list, "([a-z]+) of ([a-z]+)")
#> [1] NA "bag of flour" "bag of sugar" NA
str_extract(shopping_list, "([a-z]+) of ([a-z]+)", group = 1)
#> [1] NA "bag" "bag" NA
str_extract(shopping_list, "([a-z]+) of ([a-z]+)", group = 2)
#> [1] NA "flour" "sugar" NA
# Extract all matches
str_extract_all(shopping_list, "[a-z]+")
#> [[1]]
#> [1] "apples" "x"
#>
#> [[2]]
#> [1] "bag" "of" "flour"
#>
#> [[3]]
#> [1] "bag" "of" "sugar"
#>
#> [[4]]
#> [1] "milk" "x"
#>
str_extract_all(shopping_list, "\\b[a-z]+\\b")
#> [[1]]
#> [1] "apples"
#>
#> [[2]]
#> [1] "bag" "of" "flour"
#>
#> [[3]]
#> [1] "bag" "of" "sugar"
#>
#> [[4]]
#> [1] "milk"
#>
str_extract_all(shopping_list, "\\d")
#> [[1]]
#> [1] "4"
#>
#> [[2]]
#> character(0)
#>
#> [[3]]
#> character(0)
#>
#> [[4]]
#> [1] "2"
#>
# Simplify results into character matrix
str_extract_all(shopping_list, "\\b[a-z]+\\b", simplify = TRUE)
#> [,1] [,2] [,3]
#> [1,] "apples" "" ""
#> [2,] "bag" "of" "flour"
#> [3,] "bag" "of" "sugar"
#> [4,] "milk" "" ""
str_extract_all(shopping_list, "\\d", simplify = TRUE)
#> [,1]
#> [1,] "4"
#> [2,] ""
#> [3,] ""
#> [4,] "2"
# Extract all words
str_extract_all("This is, suprisingly, a sentence.", boundary("word"))
#> [[1]]
#> [1] "This" "is" "suprisingly" "a" "sentence"
#>
