(对不起这个奇怪的标题,但我想不出一个简短的方法来表达这个)
由于我在我提出的最后一个问题中设法过度简化了我的问题,所以这次我将向您提供实际问题。
提供的数据框包含列“usr”、“usrMsgCnt”和“isRefound”,其中 usr 是名称,usrMsgCnt 是数字,isRefound 是二进制。
将添加一个新列,其值计算如下:
usrMsgCnt/ usr 等于该行的 usr 并且 isRefound 等于 1 的行数
对于示例数据的第一行,新值将是:
9 / 5 其中 5 由 length(data$usr[data$usr=="Jan.Schrader" & data$isRefound==1])
考虑到原始数据集的大小,循环遍历不是一种选择
这是一小部分数据的输入
structure(list(usr = structure(c(21L, 21L, 21L, 21L, 6L, 5L,
6L, 6L, 6L, 21L, 20L, 21L, 6L, 20L, 21L, 21L, 21L, 6L, 6L, 6L
), .Label = c("alsmith", "Amanda.Coles", "Andrew.Coles", "babsimieth",
"Bernd.Ludwig", "Bernhard.Schiemann", "bfueck", "Bram.Ridder",
"brian.tripney", "carlosgardeazabal", "christine.elsweiler",
"cmfinner", "daniel.goncalves", "david", "de56", "eko.ma", "freundlu",
"gmcphail", "ian.ferguson", "Ian.Ruthven", "Jan.Schrader", "jearmour",
"jyang", "Laura.Schnall", "Marc.Roper", "marek.maleika", "Martin.Hacker",
"martin.scholz", "maziminke", "mclanger", "Michael.Cashmore",
"morgan.harvey", "mrussell", "msherrif", "murray.wood", "Nadine.Mahrholz",
"noam.ascher", "pburns", "Peter.Gregory", "raina", "robertnm",
"ronald.teijeira", "ronaldtf", "sbenus", "starmstr", "steve.neely",
"Sven.Friedemann", "tinchen"), class = "factor"), usrMsgCnt = c(9L,
9L, 9L, 9L, 5L, 0L, 5L, 5L, 5L, 9L, 0L, 9L, 5L, 0L, 9L, 9L, 9L,
37L, 37L, 37L), isRefound = c(0L, 1L, 1L, 1L, 1L, 0L, 0L, 1L,
1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L)), .Names = c("usr",
"usrMsgCnt", "isRefound"), row.names = c(NA, 20L), class = "data.frame")