r - 在与 R 的数据库交互中处理字段类型

Question

我使用 RMySQL 和 MySQL 数据库来存储我的数据集。有时数据会被修改，或者我也将结果存储回数据库。长话短说，在我的用例中，R 和数据库之间存在相当多的交互。

大多数时候，我使用方便的函数，如dbWriteTable和dbReadTable来写入和读取我的数据。不幸的是，这些只是完全忽略了 R 数据类型和 MySQL 字段类型。我的意思是我希望 MySQL 日期字段最终出现在一个Date或POSIX类中。反过来，我认为这些 R 类存储为某种对应的 MySQL 字段类型。这意味着日期不应该是字符 - 我不希望在这里区分浮点数和双精度数......

我也尝试使用dbGetQuery- 那里的结果相同。阅读手册时我是否完全错过了一些东西，或者这些包中根本不可能（还）？一个好的解决方法会是什么？

编辑：@mdsummer 我试图在文档中找到更多内容，但只发现这些令人失望的行：`MySQL 表作为 data.frames 读入 R，但没有将字符或逻辑数据强制转换为因素。同样，在导出 data.frames 时，因子被导出为字符向量。

整数列通常作为 R 整数向量导入，除了 BIGINT 或 UNSIGNED INTEGER 等情况，它们被强制为 R 的双精度向量以避免截断（当前 R 的整数是有符号的 32 位量）。

时间变量作为字符数据导入/导出，因此您需要将它们转换为您喜欢的日期/时间表示。

score 5 · Accepted Answer

好的，我现在有一个可行的解决方案。这是一个将 MySQL 字段类型映射到 R 类的函数。这尤其有助于处理 MySQL 字段类型日期...

dbReadMap <- function(con,table){
    statement <- paste("DESCRIBE ",table,sep="")
    desc <- dbGetQuery(con=con,statement)[,1:2]

  # strip row_names if exists because it's an attribute and not real column
  # otherweise it causes problems with the row count if the table has a row_names col
  if(length(grep(pattern="row_names",x=desc)) != 0){
  x <- grep(pattern="row_names",x=desc)
  desc <- desc[-x,]
  }



    # replace length output in brackets that is returned by describe
    desc[,2] <- gsub("[^a-z]","",desc[,2])

    # building a dictionary 
    fieldtypes <- c("int","tinyint","bigint","float","double","date","character","varchar","text")
    rclasses <- c("as.numeric","as.numeric","as.numeric","as.numeric","as.numeric","as.Date","as.character","as.character","as.character") 
    fieldtype_to_rclass = cbind(fieldtypes,rclasses)

    map <- merge(fieldtype_to_rclass,desc,by.x="fieldtypes",by.y="Type")
    map$rclasses <- as.character(map$rclasses)
    #get data
    res <- dbReadTable(con=con,table)



    i=1
    for(i in 1:length(map$rclasses)) {
        cvn <- call(map$rclasses[i],res[,map$Field[i]])
        res[map$Field[i]] <- eval(cvn)
    }


    return(res)
}

也许这不是一个好的编程习惯——我只是不知道更好。所以，使用它需要您自担风险或帮助我改进它......当然它只是它的一半：reading。希望我能尽快找到一些时间来编写一个写作函数。

如果您对映射字典有建议，请告诉我 :)

score 1 · Accepted Answer

这是@函数的一个更通用的函数，Matt Bannert它适用于查询而不是表：

# Extension to dbGetQuery2 that understands MySQL data types
dbGetQuery2 <- function(con,query){
    statement <- paste0("CREATE TEMPORARY TABLE `temp` ", query)
    dbSendQuery(con, statement)
    desc <- dbGetQuery(con, "DESCRIBE `temp`")[,1:2]
    dbSendQuery(con, "DROP TABLE `temp`")

    # strip row_names if exists because it's an attribute and not real column
    # otherweise it causes problems with the row count if the table has a row_names col
    if(length(grep(pattern="row_names",x=desc)) != 0){
        x <- grep(pattern="row_names",x=desc)
        desc <- desc[-x,]
    }

    # replace length output in brackets that is returned by describe
    desc[,2] <- gsub("[^a-z]","",desc[,2])

    # building a dictionary 
    fieldtypes <- c("int",        "tinyint",    "bigint",     "float",      "double",     "date",    "character",    "varchar",   "text")
    rclasses <-   c("as.numeric", "as.numeric", "as.numeric", "as.numeric", "as.numeric", "as.Date", "as.character", "as.factor", "as.character") 
    fieldtype_to_rclass = cbind(fieldtypes,rclasses)

    map <- merge(fieldtype_to_rclass,desc,by.x="fieldtypes",by.y="Type")
    map$rclasses <- as.character(map$rclasses)
    #get data
    res <- dbGetQuery(con,query)

    i=1
    for(i in 1:length(map$rclasses)) {
        cvn <- call(map$rclasses[i],res[,map$Field[i]])
        res[map$Field[i]] <- eval(cvn)
    }

    return(res)
}

r - 在与 R 的数据库交互中处理字段类型

2 回答 2

Related

Reference