我正在使用 sparklyr,并且无法更改列类以及使用 dplyr 聚合数据。这是我目前的代码:
.libPaths(c(.libPaths(), '/usr/lib/spark/R/lib'))
Sys.setenv(SPARK_HOME = "/usr/lib/spark")
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
library(sparklyr)
library(dplyr)
library(magrittr)
sc <- sparkR.session(master = "xxxxx")
df <- read.df("path", "csv", header = "true", inferSchema = "true", na.strings = "NA")
df1<-select(df, df$DATE, df$Subject, df$Source, df$Cost, df$Test)
DATE Subject Source Cost Test
1 11/8/2016 07gjAAAAAAAq AAAA_MOAAAGRAAAAA 2 2
2 11/8/2016 07gjAAAAAAAq BBBB_MOBBB4BBB2 7 7
3 11/8/2016 07gjAAAAAAAq BBBB_MOBICCCCCCCCC14 2 2
4 11/8/2016 07gjAAAAAAAq SCCT_MOBIDDDDDDDDD14 1 1
5 11/8/2016 07gjAAAAAAAq REET_MOBBBBBBBB01 2 1
6 11/8/2016 07gjAAAAAAAq SCCT_MRRRF4RR22 11 11
基于此的两个问题:
1) 如何将 DATE 列更改为日期类。我过去的做法是:
df1$DATE<-as.Date(df1$DATE,'%m/%d/%Y')
这是错误:
Error in as.Date.default(df1$DATE, "%m/%d/%Y") :
do not know how to convert 'df1$DATE' to class “Date”
任何帮助都会很棒,谢谢!