0

我正在尝试使用带有以下代码的函数将大.CSV文件转换为文件:.XdfrxImport()

rxImport(inData = "/poc/revor/data/ext_roll36_chrg_vol.csv",
         outFile = "/poc/revor/data/ext_roll36_chrg_vol.xdf", 
         overwrite = TRUE, rowsPerRead = 100000,
         colClasses = c(SE_NO = "character", 
                        HIER_ROLLUP_CD = "character", 
                        CUR_MO_CT ="numeric", 
                        CUR_MO_AM = "numeric", 
                        AD_LINE_1_TX = "character",
                        AD_LINE_2_TX = "character",
                        SUBMIT_DT = "character", 
                        UPDT_TS = "character"),
         transforms = list(SUBMIT_DT = as.Date(SUBMIT_DT, format="%d%b%Y")))

但是此文件包含许多记录,例如:

0200001097,SS,625,236899.000,"KRAV MAGA WORLDWIDE, INC.","KRAV MAGA WORLDWIDE, INC.",01MAY2014,07JUN2014:01:08:57.000000

如您所见,双引号内的列AD_LINE_1_TX&包含逗号。AD_LINE_2_TX

我尝试使用该type = "text"参数,但随后它读取第一列,即SE_NO即使numeric它的类型显示为character. 这是numeric我想以character.

如果我使用transform参数将列转换character为:

transforms = list(SE_NO = as.character(as.numeric(SE_NO)))

然后在从字符(指数表示)到数字的转换中,SE_NO列的值从0200001097变为。02000010002.000011e+08

那么有没有其他方法可以在不影响其他列的情况下抑制双引号内的逗号?

如果需要任何进一步的信息,请告诉我。

4

1 回答 1

0

这应该给你你所需要的......

input_file <- "/poc/revor/data/ext_roll36_chrg_vol.csv"
output_file <- "/poc/revor/data/ext_roll36_chrg_vol.xdf"

my_colInfo <- list(list(index = 1, type = "character", newName = "SE_NO"),
                   list(index = 2, type = "character", newName = "HIER_ROLLUP_CD"),
                   list(index = 3, type = "numeric", newName = "CUR_MO_CT"),
                   list(index = 4, type = "numeric", newName = "CUR_MO_AM"),
                   list(index = 5, type = "character", newName = "AD_LINE_1_TX"),
                   list(index = 6, type = "character", newName = "AD_LINE_2_TX"),
                   list(index = 7, type = "character", newName = "SUBMIT_DT"),
                   list(index = 8, type = "character", newName = "UPDT_TS"))

input_source <- RxTextData(file = input_file, 
                           colInfo = my_colInfo,
                           delimiter = ",",
                           quotedDelimiters = TRUE,
                           useFastRead = TRUE)

rxImport(inData = input_source,
         outFile = output_file, 
         overwrite = TRUE, rowsPerRead = 100000,
         transforms = list(SUBMIT_DT = as.Date(SUBMIT_DT, format="%d%b%Y")))
于 2015-07-09T20:06:59.527 回答