2

我正在尝试通过 MonetDB.R 接口将 217,000 条记录的数据集(Jeopardy 数据集)导入 MonetDB。

该文件是一个 CSV 文件,前两行如下:

show_nos, air_dt, rnd, category, prize, ques, ans,x1,x2,x3
4680,12/31/2004,Jeopardy!,THE COMPANY LINE,$200 ,"In 1963, live on ""The Art Linkletter Show"", this company served its billionth burger",McDonald's,,,

4680,12/31/2004,Jeopardy!,EPITAPHS & TRIBUTES,$200 ,"Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States",John Adams,,,

我面临的问题是在导入ques列时(“”之间的数据)。该列有多个逗号和标点符号,并且 monet.read.csv 无法导入该列。

我尝试在没有该列的情况下导入一些记录ques,并且效果很好。

您能否建议如何在 monetdb 中使用自由流动文本导入此类列?导入后,我打算对该列执行一些文本分析。

4

1 回答 1

1

利用monet.read.csv

我也更喜欢MonetDBLite更简单的设置,但monet.read.csv确实可以使用,MonetDB.R 谢谢

mylines <-
    c("show_nos, air_dt, rnd, category, prize, ques, ans,x1,x2,x3", 
    "4680,12/31/2004,Jeopardy!,THE COMPANY LINE,$200 ,\"In 1963, live on \"\"The Art Linkletter Show\"\", this company served its billionth burger\",McDonald's,,,", 
    "4680,12/31/2004,Jeopardy!,EPITAPHS & TRIBUTES,$200 ,\"Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States\",John Adams,,,")

tf <- tempfile()
dbfolder <- tempdir()

writeLines( mylines , tf )

library(MonetDBLite)
library(MonetDB.R)

db <- dbConnect( MonetDBLite() , dbfolder )

monet.read.csv( db , tf , 'mytable' )

# looks ok to me
dbReadTable( db , 'mytable' )
于 2015-12-28T19:16:02.630 回答