我正在尝试将函数应用于数据框的每个元素。这种数据框的一个简单示例是:
> accts
ACCOUNT DATE
1 2008-03-01
2 2009-06-17
3 2008-07-02
4 2009-03-15
我需要做的是查看此数据框的每一行,然后在更大的数据框中找到该帐户,如下所示:
> trans
ACCOUNT_NUM TRAN_DATE
1 2008-02-02
2 2008-04-02
3 2008-03-16
3 2009-08-22
3 2008-05-05
6 2010-11-03
7 2008-09-18
4 2009-10-14
4 2009-01-15
10 2011-07-06
对于“accts”数据框中的每一行,我需要获取与该帐户对应的“trans”数据框中的记录,该帐户也具有最接近“DATE”但在它之前发生的“TRAN_DATE”。我尝试使用应用功能:
tranDateVector <- apply(accts, 2, getTranDate)
getTranDate <- function(x)
{
tranDate <- subset(trans$TRAN_DATE, with(trans, ACCOUNT_NUM == x[1] & TRAN_DATE < x[2]))
dataDiff <- x[2] - tranDate
tranDate <- unique(date[which(dateDiff == min(dateDiff))])
return(tranDate)
}
accts <- cbind(accts, tranDateVector)
当我运行我的迷你示例时,我收到以下错误:
Error in charToDate(x) :
character string is not in a standard unambiguous format
然而,当我运行我的完整版本时,我得到了一个不同的错误,我意识到它来自这一行:
subset(trans$TRAN_DATE, with(trans, ACCOUNT_NUM == x[1] & TRAN_DATE < x[2]))
如果我将 x 设置为我的 'accts' 数据框的第三行,那么:
x
ACCOUNT DATE
3 3 2008-07-02
并运行代码的“子集”行,我收到以下错误,这与我在常规代码上遇到的错误相对应:
> subset(trans$TRAN_DATE, with(trans, ACCOUNT_NUM == x[1] & TRAN_DATE < x[2]))
Error in eval(expr, envir, enclos) :
dims [product 1] do not match the length of object [10]
In addition: Warning message:
In eval(expr, envir, enclos) :
Incompatible methods ("Ops.Date", "Ops.data.frame") for "<"
谢谢你的帮助。
(以下信息是在提供上述答案后添加的 b/c 我意识到有一个并发症)
我刚刚意识到的功能还有一些额外的限制需要考虑,这些都会导致问题变得更加复杂。在“accts”数据框中有两种不同的状态:
> accts <- data.frame(
+ ACCOUNT = 1:4,
+ DATE = as.Date(c("2008-03-01", "2009-06-17",
+ "2008-07-02", "2009-03-15")),
+ STATUS = c("new", "old", "new", "old"))
在“accts”框架中,记录可以分类为旧的或新的。如果帐户是“新”的,则它需要满足前面指定的条件,但它也只能与“trans”中标记为“revised”的记录匹配。同样对于“旧”帐户,它们只能与 trans 的“原始”记录进行比较:
> trans <- data.frame(
+ ACCOUNT_NUM = c(1,2,3,3,3,6,7,4,4,10),
+ TRAN_DATE = as.Date(c("2008-02-02", "2008-04-02",
+ "2008-03-16", "2009-08-22",
+ "2008-05-05", "2010-11-03",
+ "2008-09-18", "2009-10-14",
+ "2009-01-15", "2011-07-06")),
+ BALANCE = c("orig", "orig", "orig", "orig", "revised", "orig", "revised", "revised", "revised", "orig"))
我尝试实现您的代码以适应这种情况,如下所示:
library(plyr)
adply(accts, 1, transform,
TRAN_DATE = {
if(STATUS == "old")
{
data <- subset(trans, ACCOUNT_NUM == ACCOUNT &
TRAN_DATE < DATE & BALANCE == "orig")
}else{
data <- subset(trans, ACCOUNT_NUM == ACCOUNT &
TRAN_DATE < DATE & BALANCE == "revised")
}
tail(data$TRAN_DATE, 1) })
我从这段代码中得到以下错误:
Error in data.frame(list(ACCOUNT = 1L, DATE = 13939, STATUS = 1L), BALANCE = list( :
arguments imply differing number of rows: 1, 0
我很抱歉在我最初的帖子中没有指定这个要求,我没有意识到这会导致问题。