G. Grothendieck 的回答要好得多,但由于我已经开始研究解决方案,所以就在这里。这坚持以 R 为基础并涉及 long lapply
。假设您的数据名为“mydata”:
首先,用 & 号分割“字符串”列
temp1 <- strsplit(mydata$String, "&")
其次,这里有一个名为 in 的复杂匿名函数lapply
。我已经对这些步骤进行了注释,因此您可以看到发生了什么。
temp2 <- do.call(
"rbind",
lapply(seq_along(temp1), function(x) {
# Set the pattern we're going to look for
pattern <- "(.*)=(.*)"
# Extract names and values
Name <- gsub(pattern, "\\1", temp1[[x]])
Measure <- gsub(pattern, "\\2", temp1[[x]])
# Split the Measure value, and create a data.frame
Output <- lapply(strsplit(Measure, ","), function(x)
data.frame(as.numeric(x)))
names(Output) <- Name # Add the names back to the list
Output <- do.call(rbind, Output) # rbind the sub-lists
# Move the rownames to a column
Output$Param <- gsub("(.*)\\.[0-9]+", "\\1", rownames(Output))
rownames(Output) <- NULL # Clean up the rownames
names(Output)[1] <- "Measure" # Rename the measure variable
# Make a nice dataframe with your original data too.
data.frame(ID = mydata[x, "ID"], Output, Value = mydata[x, "Value"])
}))
结果如下所示:
temp2
# ID Measure Param Value
# 1 1 123 LocationID 100
# 2 1 321 LocationID 100
# 3 1 345 LocationID 100
# 4 1 456 TimeID 100
# 5 1 321 TimeID 100
# 6 1 789 TimeID 100
# 7 1 12 TypeID 100
# 8 1 32 TypeID 100
# 9 2 123 LocationID 50
# 10 2 345 LocationID 50
# 11 2 456 TimeID 50
# 12 2 321 TimeID 50
# 13 3 123 LocationID 120
# 14 3 321 LocationID 120
# 15 3 345 LocationID 120
# 16 3 32 TypeID 120
所以,现在我们可以很容易地aggregate
在输出上使用来得到这个:
aggregate(Value ~ Param + Measure, temp2, sum)
# Param Measure Value
# 1 TypeID 12 100
# 2 TypeID 32 220
# 3 LocationID 123 270
# 4 LocationID 321 220
# 5 TimeID 321 150
# 6 LocationID 345 270
# 7 TimeID 456 150
# 8 TimeID 789 100
为方便起见,以下dput
是数据的前几行之一:
mydata <- structure(list(ID = 1:3,
String = c("LocationID=123,321,345&TimeID=456,321,789&TypeID=12,32",
"LocationID=123,345&TimeID=456,321",
"LocationID=123,321,345&TypeID=32"),
Value = c(100L, 50L, 120L)),
.Names = c("ID", "String", "Value"),
row.names = c(NA, -3L),
class = "data.frame")