R doesn't have great string or set manipulation functions, but you can accomplish a lot by using factors and apply(). For instance, I had a column in a data frame that consisted of 3 two-character tokens concatenated together (e.g. 'abdqbk','cbdabb') that represented sets of 3 sounds played in order. I needed to recode this as a set of two factors, one indicating which set of symbols had been used in a particular trial, and another indicating the order they had been played. Solving the problem turned out to be an interesting illustration of how your tools constrain your thinking, and in this case the result wound up being fairly elegant (IMHO). Whereas if I had been working in Python I probably would have written a couple of loops to run through the strings, one to find the unique sets of symbols, and the other to assign unique values to each permutation.
permutations <- function(stimnames, toklen=2) {
# assume all stims have the same length name but otherwise stay flexible
stimlen <- nchar(as.character(stimnames[1]))
split.points <- rep(codelen, stimlen/toklen)
toks <- strmsplit(stimnames, split.points)
# sort tokens and use factor to compute unique sets
sets <- factor(apply(toks,1,function(x) {paste(sort(x), collapse='')}))
perms <- lapply(split(stimnames,sets),
function(x) {as.numeric(factor(as.character(x)))})
data.frame(sets, perms=unsplit(perms,sets))
}
strmsplit() is another little bit of apply() magic that will split a string at a set of fixed cut points. The code is from
one R tip a day.
strmsplit <- function(s, pos){
# split a string into multiple bits based on cut points
# e.g. strmsplit('st378akbkzk',c(5,2,2,2)) = c('st378','ak','bk','zk')
# from http://onertipaday.blogspot.com/2007/06/string-manipulation-insert-delim.html
start <- head(cumsum(c(1, pos)), -1) # delete last one
sel <- cbind(start=start,end=start + pos -1)
apply(sel, 1, function(x) substr(s, x[1], x[2]))
}
I make almost all my graphics in R using
ggplot2. It is really excellent for exploratory analysis, because you can easily map variables to any aesthetic dimension of a plot. For instance, you can assign a variable to the size of points in a scatter plot, and another variable to the color, which gives the sort of nice balloon plots you see in National Geographic about the relative number and proportion of single women in New York or wherever.
Producing a specific plot layout for production graphics is a little bit troublesome. Ggplot uses the grid graphics package, which is somewhat complicated. There is some explanation of how to control plot layout in the ggplot manual, but it's useful to define the following two functions to provide MATLAB-like control over subplots.
subplot <- function(x, y) viewport(layout.pos.col=x, layout.pos.row=y)
vplayout <- function(x, y) {
grid.newpage()
pushViewport(viewport(layout=grid.layout(y,x)))
}
To use these functions, call vplayout() to set up the plot grid, and then use the vp argument of the print() function to control which elements of the grid are actually used. For instance, the following code will plot to 3/4 of the horizontal space:
vplayout(4,1)
p <- ggplot() + ...
print(p, vp=subplot(c(1,2,3),1))