stringr functions now consistently implement the tidyverse recycling rules (#372). Overall this is a fairly minor change as stringi was already very close to the tidyverse rules. There are only two major changes:
Only vectors of length 1 are recycled: previously,
str_detect(letters, c("x", "y")) worked, but it now errors.
NULLs, rather than treating them as length 0 vectors.
Additionally, many more non-vectorised arguments now throw errors, rather than warnings, if supplied a vector.
str_view() will use ANSI colouring if available (#370). This works in more places than HTML widgets and requires fewer dependencies.
str_view() also no longer requires a pattern so you can use it to display strings with special characters. It now highlights whitespace characters apart from space since otherwise they are often confusing.
vignette("from-base") by @sastoudt provides a comprehensive comparison between base R functions and their stringr equivalents. It’s designed to help you move to stringr if you’re already familiar with base R string functions (#266).
stringr is now licensed as MIT (#351).
Better error message if you supply a non-string pattern (#378).
NA values more gracefully (#217). I’ve also tweaked the sizing policy so hopefully it should work better in notebooks, while preserving the existing behaviour in knit documents (#232).
Error : object ‘ignore.case’ is not exported by 'namespace:stringr'. This is because the long deprecated
perl()have now been removed.
replacement can now be a function that is called once for each match and whose return value is used to replace the match.
A new vignette (
vignette("regular-expressions")) describes the details of the regular expressions supported by stringr. The main vignette (
vignette("stringr")) has been updated to give a high-level overview of the package.
Add sample datasets:
coll() now throw an error if you use them with anything other than a plain string (#60). I’ve clarified that the replacement for
boundary() has improved defaults when splitting on non-word boundaries (#58, @lmullen).
str_extract_all() now work with
boundary(). This is particularly useful if you want to extract logical constructs like words or sentences.
str_extract_all() respects the
simplify argument when used with
stringr is now powered by stringi instead of base R regular expressions. This improves unicode and support, and makes most operations considerably faster. If you find stringr inadequate for your string processing needs, I highly recommend looking at stringi in more detail.
stringr gains a vignette, currently a straight forward update of the article that appeared in the R Journal.
str_c() now returns a zero length vector if any of its inputs are zero length vectors. This is consistent with all other functions, and standard R recycling rules. Similarly, using
str_c("x", NA) now yields
NA. If you want
str_replace_na() on the inputs.
str_replace_all() gains a convenient syntax for applying multiple pairs of pattern and replacement to the same vector:
str_conv() to convert strings from specified encoding to UTF-8.
boundary() allows you to count, locate and split by character, word, line and sentence boundaries.
The documentation got a lot of love, and very similar functions (e.g. first and all variants) are now documented together. This should hopefully make it easier to locate the function you need.
ignore.case(x) has been deprecated in favour of
fixed|regex|coll(x, ignore.case = TRUE),
perl(x) has been deprecated in favour of
str_join() is deprecated, please use
fixed path in
str_wrap example so works for more R installations.
remove dependency on plyr
Zero input to
str_split_fixed returns 0 row matrix with
perl that switches to Perl regular expressions
str_match now uses new base function
regmatches to extract matches - this should hopefully be faster than my previous pure R algorithm
str_wrap function which gives
strwrap output in a more convenient format
word function extract words from a string given user defined separator (thanks to suggestion by David Cooper)
str_locate now returns consistent type when matching empty string (thanks to Stavros Macrakis)
str_count counts number of matches in a string.
str_trim receive performance tweaks - for large vectors this should give at least a two order of magnitude speed up
str_length returns NA for invalid multibyte strings
fix small bug in internal