1

我找到了这个 SAS 的 SQL 代码,我想把它翻译成 RSQL Lite。


proc sql;
create table crspcomp as
select a.*, b.ret, b.date
from ccm1 as a left join crsp.msf as b
on a.permno=b.permno
and intck('month',a.datadate,b.date)
between 3 and 14;
quit;

发生的第一个问题是 R 没有提供 intck 函数,该函数返回两个日期之间的月差。我发现了一个类似的函数(在stackoverflow),它看起来像这样:

mob<-function (begin, end) {
  begin<-paste(substr(begin,1,6),"01",sep="")
  end<-paste(substr(end,1,6),"01",sep="")
  mob1<-as.period(interval(ymd(begin),ymd(end)))
  mob<-mob1@year*12+mob1@month
  mob
}

我已经在 RSQL 之外测试了 mob 函数,到目前为止它运行良好。现在我想把 mob 函数放到上面写的 SQL 语句中。在 SQL 代码中,我想合并 permno 上的数据,此外我想将数据滞后 3 个月(这就是我使用 mob 函数的原因)。


年度文件如下所示:

GVKEY,datadate,fyear,fyr,bkvlps,permno
14489,19980131,1997,1,4.0155,11081
14489,19990131,1998,1,1.8254,11081
14489,20000131,1999,1,2.0614,11081
14489,20010131,2000,1,2.1615,11081
14489,20020131,2001,1,1.804,11081

CRSP 文件如下所示

permno,date,ret
11081,20000103,0.1
11081,20000104,0.2

install.packages('DBI')
install.packages('RSQLite')


mob<-function (begin, end) {
  begin<-paste(substr(begin,1,6),"01",sep="")
  end<-paste(substr(end,1,6),"01",sep="")
  mob1<-as.period(interval(ymd(begin),ymd(end)))
  mob<-mob1@year*12+mob1@month
  mob
}

Annual_File <- "C:/Users/XYZ"
Annual_File  <- paste0(Annual_File ,".csv",sep="")

 inputFile <- "C:/Users/XYZ"
 inputFile <- paste0(inputFile.csv",sep="")

con <- dbConnect(RSQLite::SQLite(), dbname='CCM')

dbWriteTable(con, name="CRSP", value=inputFile, row.names=FALSE, header=TRUE, overwrite=TRUE)
dbWriteTable(con, name="Annual_File", value=Annual_File, row.names=FALSE, header=TRUE, overwrite=TRUE)



 DSQL <- "select a.*, b.ret, b.date 
          from Annual_File as a left join
          CRSP as b
          on a.permno=b.PERMNO
          and mob(a.datadate,b.date)
                between 3 and 14"


  yourData <- dbGetQuery(con,DJSQL)

即使很难,我也定义了函数 - 错误如下所示。

Error in sqliteSendQuery(con, statement, bind.data) : 
  error in statement: no such function: mob
4

1 回答 1

1

您只能在 SQLite 中使用 SQL 函数(以及用 C 编写的函数)。您不能使用 R 函数。

此外,SQLite 不适合日期处理,因为它没有日期和时间类型。SQLite 提供的功能可以解决问题(请参阅最后的注释),但我建议您改用 H2 数据库。它已datediff内置。请注意,根据您的需要,您可能需要将最后两个参数的顺序反转为datediff.

library(RH2)
library(sqldf)

# create test data frames

Lines1 <- "GVKEY,datadate,fyear,fyr,bkvlps,permno
14489,19980131,1997,1,4.0155,11081
14489,19990131,1998,1,1.8254,11081
14489,20000131,1999,1,2.0614,11081
14489,20010131,2000,1,2.1615,11081
14489,20020131,2001,1,1.804,11081"

Lines2 <- "permno,date,ret
11081,20000103,0.1
11081,20000104,0.2"

fmt <- "%Y%m%d"

Annual_File <- read.csv(text = Lines1)
Annual_File$datadate <- as.Date(as.character(Annual_File$datadate), format = fmt)

CRSP <- read.csv(text = Lines2)
CRSP$date <- as.Date(as.character(CRSP$date), format = fmt)

# run SQL statement using sqldf

sqldf("select a.*, b.ret, b.date, datediff('month', a.datadate, b.date) diff
          from Annual_File as a 
          left join CRSP as b 
          on a.permno = b.permno and 
             datediff('month', a.datadate, b.date) between 3 and 14")

给予:

  GVKEY   datadate fyear fyr bkvlps permno ret       date diff
1 14489 1998-01-31  1997   1 4.0155  11081  NA       <NA>   NA
2 14489 1999-01-31  1998   1 1.8254  11081 0.1 2000-01-03   12
3 14489 1999-01-31  1998   1 1.8254  11081 0.2 2000-01-04   12
4 14489 2000-01-31  1999   1 2.0614  11081  NA       <NA>   NA
5 14489 2001-01-31  2000   1 2.1615  11081  NA       <NA>   NA
6 14489 2002-01-31  2001   1 1.8040  11081  NA       <NA>   NA

注意:要使用 SQLite,请使用它,其中 2440588.5 用于在 R 的 UNIX 纪元日期原点和 SQLite 函数假定的日期原点之间进行转换。

library(sqldf)
try(detach("package:RH2"), silent = TRUE)  # detach RH2 if present

sqldf("select a.*, b.ret, b.date
          from Annual_File as a 
          left join CRSP as b 
          on a.permno = b.permno and 
             b.date + 2440588.5 between julianday(a.datadate + 2440588.5, '+3 months') and 
                                        julianday(a.datadate + 2440588.5, '+12 months')")
于 2016-03-17T13:57:27.303 回答