Title: | Utilities for Handling Strings and Text |
---|---|
Description: | Utilities for handling character vectors that store human-readable text (either plain or with markup, such as HTML or LaTeX). The package provides, in particular, functions that help with the preparation of plain-text reports, e.g. for expanding and aligning strings that form the lines of such reports. The package also provides generic functions for transforming R objects to HTML and to plain text. |
Authors: | Enrico Schumann [aut, cre] |
Maintainer: | Enrico Schumann <[email protected]> |
License: | GPL-3 |
Version: | 0.4-2 |
Built: | 2024-11-21 02:48:06 UTC |
Source: | https://github.com/enricoschumann/textutils |
Utilities for handling character vectors that store human-readable text (either plain or with markup, such as HTML or LaTeX). The package provides, in particular, functions that help with the preparation of plain-text reports, e.g. for expanding and aligning strings that form the lines of such reports. The package also provides generic functions for transforming R objects to HTML and to plain text.
The package comprises a number of functions that help with manipulating character strings.
For more information and a complete list of functions, use
‘library(help = "textutils")
’.
Enrico Schumann [aut, cre] (<https://orcid.org/0000-0001-7601-6576>)
Maintainer: Enrico Schumann <[email protected]>
Create a LaTeX-table.
btable(x, unit = "cm", before = "", after = "", raise = "0.2ex", height = "1ex", ...)
btable(x, unit = "cm", before = "", after = "", raise = "0.2ex", height = "1ex", ...)
x |
numeric: the numbers for which the barplot is to be created |
unit |
character: a valid TeX unit |
before |
character |
after |
character |
raise |
character |
height |
character |
... |
more arguments |
Creates a barplot table.
character
Enrico Schumann
## see vignette
## see vignette
Create a LaTeX-table.
dctable(x, unitlength = "1 cm", width = 5, y.offset = 0.07, circle.size = 0.1, xlim, na.rm = FALSE)
dctable(x, unitlength = "1 cm", width = 5, y.offset = 0.07, circle.size = 0.1, xlim, na.rm = FALSE)
x |
numeric: the numbers for which the barplot is to be created |
unitlength |
character |
width |
numeric |
y.offset |
numeric |
circle.size |
numeric |
xlim |
character |
na.rm |
logical |
Creates a dotchart table.
This function is currently very experimental.
character
Enrico Schumann
Cleveland, W. S. (1985) The Elements of Graphing Data. Wadsworth.
## see vignette
## see vignette
Light-weight template filling: replace placeholders in a string by values.
fill_in(s, ..., delim = c("{", "}"), replace.NA = TRUE)
fill_in(s, ..., delim = c("{", "}"), replace.NA = TRUE)
s |
character |
... |
typically name/value pairs. See Examples. |
delim |
characters |
replace.NA |
logical: if TRUE, |
A light-weight replacement function.
character
Enrico Schumann
template <- "{1} meets {2}" fill_in(template, "Peter", "Paul") ## "Peter meets Paul" template <- "{one} meets {other}" fill_in(template, one = "Peter", other = "Paul") ## "Peter meets Paul" ## handling missing values fill_in("{name}: {score}", name = "Peter", score = NA) ## [1] "Peter: NA" fill_in("{name}: {score}", name = "Peter", score = NA, replace.NA = ".") ## [1] "Peter: ."
template <- "{1} meets {2}" fill_in(template, "Peter", "Paul") ## "Peter meets Paul" template <- "{one} meets {other}" fill_in(template, one = "Peter", other = "Paul") ## "Peter meets Paul" ## handling missing values fill_in("{name}: {score}", name = "Peter", score = NA) ## [1] "Peter: NA" fill_in("{name}: {score}", name = "Peter", score = NA, replace.NA = ".") ## [1] "Peter: ."
Read lines and convert into appropriate vector or data frame.
here(s, drop = TRUE, guess.type = TRUE, sep = NULL, header = TRUE, stringsAsFactors = FALSE, trim = TRUE, ...)
here(s, drop = TRUE, guess.type = TRUE, sep = NULL, header = TRUE, stringsAsFactors = FALSE, trim = TRUE, ...)
s |
a string |
drop |
logical: drop empty first and last element |
guess.type |
logical |
sep |
NULL or character |
header |
logical |
stringsAsFactors |
logical |
trim |
logical: trim whitespace? |
... |
named arguments to be passed to |
Experimental. (Notably, the function's name may change.)
The function reads a (typically multi-line) string and treats each
line as one element of a vector or, if sep is specified, a
data.frame
.
If sep
is not specified, here
calls
type.convert
on the input s
.
If sep
is specified, the input s
is fed to
read.table
. Additional arguments may be passed
through ....
a vector or, if sep
is specified, a data.frame
Enrico Schumann
https://rosettacode.org/wiki/Here_document
(note that R supports multi-line strings, so in a way it has built-in support for here documents as defined on that website)
## numbers here(" 1 2 3 4 ") ## character here(" Al Bob Carl David ") ## data frame here(" letter, number x, 1 y, 2 z, 3", sep = ",")
## numbers here(" 1 2 3 4 ") ## character here(" Al Bob Carl David ") ## data frame here(" letter, number x, 1 y, 2 z, 3", sep = ",")
Decode and encode HTML entities.
HTMLdecode(x, named = TRUE, hex = TRUE, decimal = TRUE) HTMLencode(x, use.iconv = FALSE, encode.only = NULL) HTMLrm(x, ...)
HTMLdecode(x, named = TRUE, hex = TRUE, decimal = TRUE) HTMLencode(x, use.iconv = FALSE, encode.only = NULL) HTMLrm(x, ...)
x |
|
use.iconv |
logical. Should conversion via |
named |
logical: replace named character references? |
hex |
logical: replace hexadecimal character references? |
decimal |
logical: replace decimal character references? |
encode.only |
character |
... |
other arguments |
HTMLdecode
replaces named, hexadecimal and decimal
character references as defined by HTML5 (see
References) with characters. The resulting character vector
is marked as UTF-8 (see Encoding
).
HTMLencode
replaces UTF-8-encoded
substrings with HTML5 named entities (a.k.a.
“named character references”). A semicolon
‘;
’ will not be replaced by the entity
‘;
’. Other than that, however,
HTMLencode
is quite thorough in its job: it will
replace all characters for which named entities exists, even
‘,
’ and or ‘?
’. You
can restrict the characters to be replaced by specifying
encode.only
.
HTMLrm
removes HTML tags. All content
between style
and head
tags is removed, as are
comments. Note that each element of x
is considered
a single HTML document; so for multiline
documents, paste/collapse the document.
character
Enrico Schumann
https://www.w3.org/TR/html5/syntax.html#named-character-references
https://html.spec.whatwg.org/multipage/syntax.html#character-references
HTMLdecode(c("Max & Moritz", "4 < 9")) ## [1] "Max & Moritz" "4 < 9" HTMLencode(c("Max & Moritz", "4 < 9")) ## [1] "Max & Moritz" "4 < 9" HTMLencode("Max, Moritz & more") ## [1] "Max, Moritz & more" HTMLencode("Max, Moritz & more", encode.only = c("&", "<", ">")) ## [1] "Max, Moritz & more" HTMLrm("before <a href='http://enricoschumann.net'>LINK</a> after") ## [1] "before http://enricoschumann.net after"
HTMLdecode(c("Max & Moritz", "4 < 9")) ## [1] "Max & Moritz" "4 < 9" HTMLencode(c("Max & Moritz", "4 < 9")) ## [1] "Max & Moritz" "4 < 9" HTMLencode("Max, Moritz & more") ## [1] "Max, Moritz & more" HTMLencode("Max, Moritz & more", encode.only = c("&", "<", ">")) ## [1] "Max, Moritz & more" HTMLrm("before <a href='http://enricoschumann.net'>LINK</a> after") ## [1] "before http://enricoschumann.net after"
Insert elements into a vector.
insert(x, values, before.index)
insert(x, values, before.index)
x |
a vector |
values |
elements to insert |
before.index |
numeric: before which positions of the original vector to insert the new elements |
Inserts elements into a vector.
A vector with values
inserted. If either
values
or before.index
are of length
zero, the original vector is returned.
Enrico Schumann
x <- letters[1:5] ## [1] "a" "b" "c" "d" "e" insert(x, values = "Z", c(2, 5)) ## [1] "a" "Z" "b" "c" "d" "Z" "e"
x <- letters[1:5] ## [1] "a" "b" "c" "d" "e" insert(x, values = "Z", c(2, 5)) ## [1] "a" "Z" "b" "c" "d" "Z" "e"
Create a LaTeX-rule, including colours.
latexrule(x, y, col = NULL, x.unit = "cm", y.unit = "cm", noindent = FALSE)
latexrule(x, y, col = NULL, x.unit = "cm", y.unit = "cm", noindent = FALSE)
x |
numeric |
y |
numeric |
col |
character |
x.unit |
character |
y.unit |
character |
noindent |
logical |
Experimental. Create LaTeX code that produces rules.
character
Enrico Schumann
## see vignette
## see vignette
Remove a repeated pattern in a character vector.
rmrp(s, pattern, ...)
rmrp(s, pattern, ...)
s |
a character vector |
pattern |
a regular expression |
... |
arguments passed to |
rmrp
removes a repeated pattern in a character vector (e.g.
repeated blank lines).
a character vector
Enrico Schumann
## remove repeated blanks from vector s <- c("* Header", "", " ","", "** Subheader") rmrp(s, "^ *$")
## remove repeated blanks from vector s <- c("* Header", "", " ","", "** Subheader") rmrp(s, "^ *$")
Create character vectors of white space.
spaces(n)
spaces(n)
n |
integer |
The function creates a character vector of white-space strings. Such vectors are useful, for instance, for padding character vectors.
character
Enrico Schumann
spaces(0:3)
spaces(0:3)
Expand strings to a fixed ‘length’ (in the sense of
nchar
).
strexp(s, after, width, fill = " ", at)
strexp(s, after, width, fill = " ", at)
s |
a character vector |
after |
character: a pattern, to be passed to |
width |
integer |
fill |
character |
at |
integer |
strexp
inserts blanks into the elements of a character vector
such that all elements have the same width
(i.e.
nchar
). Note that it will (currently) not contract a
string, only expand it.
a character vector
Enrico Schumann
## expand to width 10, but keep two initial blanks s <- c(" A 1", " B 2") strexp(s, after = " +[^ ]+ +", width = 10)
## expand to width 10, but keep two initial blanks s <- c(" A 1", " B 2") strexp(s, after = " +[^ ]+ +", width = 10)
Encode specical characters for TeX/LaTeX.
TeXencode(s)
TeXencode(s)
s |
character |
Probably incomplet
numeric
Enrico Schumann
Donald E. Knuth. The TeXbook. Addison Wesley, 1986 (with corrections made in 1996).
Leslie Lamport. LaTeX: A Document Preparation System. Addison Wesley, 1994.
TeXencode("Peter & Paul") ## [1] "Peter \& Paul"
TeXencode("Peter & Paul") ## [1] "Peter \& Paul"
Translates units of measurement known to TeX and LaTeX.
TeXunits(from, to, from.unit = NULL)
TeXunits(from, to, from.unit = NULL)
from |
Typically character, such as |
to |
character |
from.unit |
character |
Available units are centimetre (cm
), inch (in
), point
(pt
), pica (pc
), big point(bp
), millimetre
(mm
), Didot points (dd
) and Cicero (cc
).
See Chapter 10 of the TeXbook for details.
numeric
Enrico Schumann
Donald E. Knuth. The TeXbook. Addison Wesley, 1986 (with corrections made in 1996).
TeXunits("1in", c("in", "mm", "pt", "in")) TeXunits(c("1in", "2in"), "cm")
TeXunits("1in", c("in", "mm", "pt", "in")) TeXunits(c("1in", "2in"), "cm")
Remove leading and/or trailing white space from character vectors.
title_case(s, strict = FALSE, ignore = NULL)
title_case(s, strict = FALSE, ignore = NULL)
s |
a character vector |
strict |
logical: if TRUE, only the first letter of each word is uppercase |
ignore |
character |
Set string in title case.
a character vector
Enrico Schumann
title_case("text mining")
title_case("text mining")
Convert an R object to an HTML snippet.
toHTML(x, ...) ## S3 method for class 'data.frame' toHTML(x, ..., row.names = FALSE, col.names = TRUE, class.handlers = list(), col.handlers = list(), replace.NA = NULL, td.id = FALSE)
toHTML(x, ...) ## S3 method for class 'data.frame' toHTML(x, ..., row.names = FALSE, col.names = TRUE, class.handlers = list(), col.handlers = list(), replace.NA = NULL, td.id = FALSE)
x |
an object |
... |
arguments passed to methods |
row.names |
logical |
col.names |
logical |
class.handlers |
a list of named functions |
col.handlers |
a list of named functions |
replace.NA |
|
td.id |
logical |
There exists toHTML
methods in several packages,
e.g. in tools or XML. Package R2HTML has
a HTML
generic.
The ‘semantics’ of this function may differ from
other implementations: the function is expected to take an
arbitrary R object and return an HTML snippet
that can be placed in reports, i.e. the function works in
the same spirit as toLatex
. By contrast, the
purpose of toHTML
in tools is to
provide a whole HTML document.
The data.frame
method has two handlers
arguments: these may store helper functions for formatting
columns, either of a specific name (col.handlers
) or
of a specific class(class.handlers
). The functions in
col.handlers
are applied first; and the affected
columns are not touched by class.handlers
. See
Examples.
If td.id
is TRUE
, all data cells in the table
(i.e. td elements) gain an id
-attribute of the form
td_<row>_<col>
.
a character vector
Enrico Schumann
x <- data.frame(a = 1:3, b = rnorm(3)) cat(toHTML(x, col.handlers = list(b = function(x) round(x, 1)), class.handlers = list(integer = function(x) 100*x))) ## [ pretty-printed... ] ## <tr> <th>a</th> <th>b</th> </tr> ## <tr> <td>100</td><td>-2.3</td> </tr> ## <tr> <td>200</td><td>-0.1</td> </tr> ## <tr> <td>300</td><td>-2.8</td> </tr>
x <- data.frame(a = 1:3, b = rnorm(3)) cat(toHTML(x, col.handlers = list(b = function(x) round(x, 1)), class.handlers = list(integer = function(x) 100*x))) ## [ pretty-printed... ] ## <tr> <th>a</th> <th>b</th> </tr> ## <tr> <td>100</td><td>-2.3</td> </tr> ## <tr> <td>200</td><td>-0.1</td> </tr> ## <tr> <td>300</td><td>-2.8</td> </tr>
Convert data frames to character vector in LaTeX markup.
## S3 method for class 'data.frame' toLatex(object, row.names = FALSE, col.handlers = list(), class.handlers = list(), eol = "\\\\", ...)
## S3 method for class 'data.frame' toLatex(object, row.names = FALSE, col.handlers = list(), class.handlers = list(), eol = "\\\\", ...)
object |
|
row.names |
include the row names as the first column |
col.handlers |
a list of named functions |
class.handlers |
a list of named functions |
eol |
character: the line ending; may be a vector of length greater than one |
... |
other arguments |
A method for toLatex
that converts data
frames into LaTeX markup. Any formatting to be
applied must be specifed as a function and passed
with col.handlers
and class.handlers
.
col.handlers
take precedent over
class.handlers
.
character
Enrico Schumann
df <- data.frame(letter = letters[1:5], number = runif(5), stringsAsFactors = FALSE) toLatex(df, col.handlers = list(letter = toupper), class.handlers = list(numeric = function(x) format(x, digits = 4)), eol = "\\[1ex]") cat(toLatex(df, col.handlers = list(letter = toupper), class.handlers = list(numeric = function(x) format(x, digits = 4)), eol = "\\[1ex]"), sep = "\n")
df <- data.frame(letter = letters[1:5], number = runif(5), stringsAsFactors = FALSE) toLatex(df, col.handlers = list(letter = toupper), class.handlers = list(numeric = function(x) format(x, digits = 4)), eol = "\\[1ex]") cat(toLatex(df, col.handlers = list(letter = toupper), class.handlers = list(numeric = function(x) format(x, digits = 4)), eol = "\\[1ex]"), sep = "\n")
Converts an R object into a text representation.
toText(x, ...) ## Default S3 method: toText(x, ...)
toText(x, ...) ## Default S3 method: toText(x, ...)
x |
an object |
... |
arguments passed to methods |
A generic function. Method are expected to coerce a given object to
lines of human-readable text that can be used, for instance, for
reports. The purpose of toText
is not to store data in a
form that can be read and understood by R; for that, see
dput
or dump
.
The print
method is essentially equivalent to
cat(x, sep = "\n")
.
There is no restriction on encoding, so plain text does not necessarily mean ASCII. But current methods do not map into markup-representations.
A character vector (lines of text), possibly with a class attribute
text
.
Enrico Schumann
toText(c("a", "b", "c")) cat(toHTML(toText(c("a", "b", "c"))))
toText(c("a", "b", "c")) cat(toHTML(toText(c("a", "b", "c"))))
Remove leading and/or trailing white space from character vectors.
trim(s, leading = TRUE, trailing = TRUE, perl = TRUE, ...)
trim(s, leading = TRUE, trailing = TRUE, perl = TRUE, ...)
s |
a character vector |
leading |
logical |
trailing |
logical |
perl |
logical |
... |
arguments passed to |
trim
removes leading and trailing space, which is defined
through the (Perl) regular expression \s
.
The base package has a function trimws
these days,
so you may not actually need the function (any more).
a character vector
Enrico Schumann
s <- c("\t 2 2\n \t", " ab ") trim(s)
s <- c("\t 2 2\n \t", " ab ") trim(s)
Vertically align character vectors.
valign(s, align = "|", insert.at = "<>", replace = TRUE, fixed = TRUE)
valign(s, align = "|", insert.at = "<>", replace = TRUE, fixed = TRUE)
s |
a character vector |
align |
a regular expression |
insert.at |
a regular expression |
replace |
logical |
fixed |
logical |
The function expands the elements of a character vector in such a way that the elements are vertically aligned, which can be handy when generating reports. See Examples.
a character vector
Enrico Schumann
s <- c("Player 1 <>| 100", "another player <>| 999999") cat(paste(s, collapse = "\n")) ## Player 1 <>| 100 ## another player <>| 999999 cat(paste(valign(s), collapse = "\n")) ## Player 1 100 ## another player 999999
s <- c("Player 1 <>| 100", "another player <>| 999999") cat(paste(s, collapse = "\n")) ## Player 1 <>| 100 ## another player <>| 999999 cat(paste(valign(s), collapse = "\n")) ## Player 1 100 ## another player 999999