r - Melt and dcast based on the name of the original data frame column

Question

I'm having a hard time reshaping a dataframe for use with error bar plots, combining all the columns with centeral-tendency data and, separately, all the columns with error data.

I start with a data frame with a column for the independent variable, and then two columns for each measured parameter: one for the average value, and one for the error, as you'd typically format a spreadsheet with this kind of data. The initial data frame looks like this:

df<-data.frame(
  indep=1:3, 
  Amean=runif(3), 
  Aerr=rnorm(3), 
  Bmean=runif(3), 
  Berr=rnorm(3)
)

I'd like to use melt and dcast to get it into a form that looks like this:

df.cast<-data.frame(
  indep=rep(1:3, 2), 
  series=c(rep("A", 3), 
  rep("B", 3)), 
  means=runif(6), 
  errs=rnorm(6)
)

So that I can then feed it to ggplot like this:

qplot(data=df.cast, x=indep, y=means, ymin=means-errs, ymax=means+errs, 
      col=series, geom="errorbar")

I've been trying to melt and then recast using expressions like this:

df.melt<-melt(df, id.vars="indep")
dcast(df.melt, 
  indep~(variable=="Amean"|variable=="Bmean") + (variable=="Aerr"|variable=="Berr")
)

but these return a dataframe with funny boolean columns.

I could manually make two dataframes (one for the mean values, one for the errors), melt them separately, and recombine, but surely there must be a more elegant way?

score 3 · Accepted Answer

我会这样做：

# Melt the data

mdf <- melt(df, id.vars="indep")

# Separate the series from the statistic with regular expressions

mdf$series <- gsub("([A-Z]).*", "\\1", mdf$variable)
mdf$stat <- gsub("[A-Z](.*)", "\\1", mdf$variable)

# Cast the data (after dropping the original melt variable

cdf <- dcast(mdf[, -2], indep+series ~ stat)

# Plot

qplot(data=cdf, x=indep, y=mean, ymin=mean-err, ymax=mean+err, 
    colour=series, geom="errorbar")

在此处输入图像描述

score 2 · Accepted Answer

您可以reshape在基础 R 中使用它来完成它

df.cast <- reshape(df, varying = 2:5, direction = 'long', timevar = 'series',
  v.names = c('mean', 'err'), times = c('A', 'B'))
qplot(data = df.cast, x = indep, y = mean, ymin = mean - err, ymax = mean + err, 
  colour = series, geom = "errorbar")

r - Melt and dcast based on the name of the original data frame column

2 回答 2

Related

Reference