A wrapper around paste that does some simple cleaning appropriate for prose sentences. It

  1. trims leading and trailing whitespace

  2. collapses runs of whitespace into a single space

  3. appends a period (.) if there is no terminal punctuation mark (., ?, or !)

  4. removes spaces preceding punctuation characters: .?!,;:

  5. collapses sequences of punctuation marks (.?!,;:) (possibly separated by spaces), into a single punctuation mark. The first punctuation mark of the sequence is used, with priority given to terminal punctuation marks .?! if present

  6. makes sure a space or end-of-string follows every one of .?!,;:, with an exception for the special case of .,: followed by a digit, indicating the punctuation is decimal period, number separator, or time delimiter

  7. capitalizes the first letter of each sentence (start-of-string or following a .?!)

sentence(...)

Arguments

...

passed on to paste

Examples

compare <- function(x) cat(sprintf(' in: "%s"\nout: "%s"\n', x, sentence(x))) compare("capitilized and period added")
#> in: "capitilized and period added" #> out: "Capitilized and period added."
compare("whitespace:added ,or removed ; like this.and this")
#> in: "whitespace:added ,or removed ; like this.and this" #> out: "Whitespace: added, or removed; like this. And this."
compare("periods and commas in numbers like 1,234.567 are fine !")
#> in: "periods and commas in numbers like 1,234.567 are fine !" #> out: "Periods and commas in numbers like 1,234.567 are fine!"
compare("colons can be punctuation or time : 12:00 !")
#> in: "colons can be punctuation or time : 12:00 !" #> out: "Colons can be punctuation or time: 12:00!"
compare("only one punctuation mark at a time!.?,;")
#> in: "only one punctuation mark at a time!.?,;" #> out: "Only one punctuation mark at a time!"
compare("The first mark ,; is kept;,,with priority for terminal marks ;,.")
#> in: "The first mark ,; is kept;,,with priority for terminal marks ;,." #> out: "The first mark, is kept; with priority for terminal marks."
# vectorized like paste() sentence( "The", c("first", "second", "third"), "letter is", letters[1:3], parens("uppercase:", sngl_quote(LETTERS[1:3])), ".")
#> [1] "The first letter is a (uppercase: 'A')." #> [2] "The second letter is b (uppercase: 'B')." #> [3] "The third letter is c (uppercase: 'C')."