dcast
当我尝试使用(来自reshape2
包)重塑特定数据框时,RStudio 崩溃了。我发现崩溃实际上是在 R 本身中发生的,所以我在 R.app 中运行了我的强制转换代码,并得到了为该站点命名的错误类型:Error: segfault from C stack overflow
. 在 Google 和 SO 的帮助下,我了解到这是一个内存访问错误。
好吧,我已经走了这么远,但我不知道从这里去哪里。我无法提供一个真正可重现的示例,因为我的数据框大约有 558,000 行,并且在小型玩具示例上不会出现问题。例如,即使我采用 50,000 行的数据子集,dcast
也可以正常工作。是否存在导致问题的特定数据行?如果是这样,任何人都可以建议寻找哪些功能可能导致我遇到的错误类型?
这是我从中转换的数据框的一个子集(一些变量的假值),然后是我正在使用的转换函数。我还在dput
下面的函数中包含了这个小数据片段,以防万一使用它会有所帮助。真实数据集大约有 700 个值prog
、15 个 值prog1
和 5 个 值fa.type
。
id term yr nslds acad.lev prog prog1 fa.type amount
1 1 Fall 2009 2010 Graduate Graduate loan 1 Other Loans Loan 5000
2 1 Spring 2010 2010 Graduate Graduate loan 1 Other Loans Loan 5000
3 2 Fall 2009 2010 Graduate Graduate loan 2 Stafford Loan Loan 8781
4 2 Spring 2010 2010 Graduate Graduate loan 2 Stafford Loan Loan 8781
5 3 Fall 2007 2008 Graduate Graduate loan 3 Stafford Loan Loan 4250
6 3 Fall 2007 2008 Graduate Graduate grant 1 University Grant Grant 1707
fa.wide = dcast(id + term + yr + nslds + acad.lev ~ prog1 + fa.type , data=fa, value.var="amount", fun.aggregate=sum)
fa = structure(list(id = c(1, 1, 2, 2, 3, 3), term = structure(c(7L,
8L, 7L, 8L, 1L, 1L), .Label = c("Fall 2007", "Spring 2008", "Summer 2008",
"Fall 2008", "Spring 2009", "Summer 2009", "Fall 2009", "Spring 2010",
"Summer 2010", "Fall 2010", "Spring 2011", "Summer 2011", "Fall 2011",
"Spring 2012", "Summer 2012", "Fall 2012", "Spring 2013"), class = c("ordered",
"factor")), yr = c(2010L, 2010L, 2010L, 2010L, 2008L, 2008L),
nslds = structure(c(7L, 7L, 7L, 7L, 7L, 7L), .Label = c("1st Year, Never Attended",
"1st Year, Previously Attended", "2nd Year", "3rd Year",
"4th Year", "5th Year+", "Graduate"), class = c("ordered",
"factor")), acad.lev = structure(c(6L, 6L, 6L, 6L, 6L, 6L
), .Label = c("Freshman", "Sophomore", "Junior", "Senior",
"PB Undergrad", "Graduate"), class = c("ordered", "factor"
)), prog = c("loan 1", "loan 1", "loan 2", "loan 2", "loan 3",
"grant 1"), prog1 = c("Other Loans", "Other Loans", "Stafford Loan",
"Stafford Loan", "Stafford Loan", "University Grant"), fa.type = structure(c(3L,
3L, 3L, 3L, 3L, 2L), .Label = c("Athletic", "Grant", "Loan",
"Scholarship", "Waiver", "Work/Study"), class = "factor"),
amount = c(5000, 5000, 8781, 8781, 4250, 1707)), .Names = c("id",
"term", "yr", "nslds", "acad.lev", "prog", "prog1", "fa.type",
"amount"), row.names = c(NA, 6L), class = "data.frame")