A wrapper around paste
that does some simple cleaning appropriate for
prose sentences. It
trims leading and trailing whitespace
collapses runs of whitespace into a single space
appends a period (.
) if there is no terminal punctuation
mark (.
, ?
, or !
)
removes spaces preceding punctuation characters: .?!,;:
collapses sequences of punctuation marks (.?!,;:
) (possibly
separated by spaces), into a single punctuation mark.
The first punctuation mark of the sequence is used, with
priority given to terminal punctuation marks .?!
if present
makes sure a space or end-of-string follows every one of
.?!,;:
, with an exception for the special case of .,:
followed by a digit, indicating the punctuation is decimal period,
number separator, or time delimiter
capitalizes the first letter of each sentence (start-of-string or
following a .?!
)
sentence(...)
... | passed on to |
---|
compare <- function(x) cat(sprintf(' in: "%s"\nout: "%s"\n', x, sentence(x))) compare("capitilized and period added")#> in: "capitilized and period added" #> out: "Capitilized and period added."compare("whitespace:added ,or removed ; like this.and this")#> in: "whitespace:added ,or removed ; like this.and this" #> out: "Whitespace: added, or removed; like this. And this."compare("periods and commas in numbers like 1,234.567 are fine !")#> in: "periods and commas in numbers like 1,234.567 are fine !" #> out: "Periods and commas in numbers like 1,234.567 are fine!"compare("colons can be punctuation or time : 12:00 !")#> in: "colons can be punctuation or time : 12:00 !" #> out: "Colons can be punctuation or time: 12:00!"compare("only one punctuation mark at a time!.?,;")#> in: "only one punctuation mark at a time!.?,;" #> out: "Only one punctuation mark at a time!"compare("The first mark ,; is kept;,,with priority for terminal marks ;,.")#> in: "The first mark ,; is kept;,,with priority for terminal marks ;,." #> out: "The first mark, is kept; with priority for terminal marks."# vectorized like paste() sentence( "The", c("first", "second", "third"), "letter is", letters[1:3], parens("uppercase:", sngl_quote(LETTERS[1:3])), ".")#> [1] "The first letter is a (uppercase: 'A')." #> [2] "The second letter is b (uppercase: 'B')." #> [3] "The third letter is c (uppercase: 'C')."