r - 具有列的字段的 read.csv.sql 过滤器

Question

LOC_NAME,BIRTH_DTTM,MOM_PAT_MRN_ID,EMPI,MOM_PAT_NAME,MOM_HOSP_ADMSN_TIME,MOM_HOSP_DISCH_TIME,DEL_PROV_NAME,ATTND_PROV_NAME,DELIVERY_TYPE,PRIM.REPT,COUNT_OF_BABIES,CHILD_PED_GEST_AGE_NUM,REASON_FOR_DEL,REASON_DEL_COM,INDUCT_METHOD,INDUCT_COM,AUGMENTATION
HOSPITAL,1/1/2000 10:00,abc,Eabc,"Surname1, Given1",1/1/2000 10:00,1/3/2000 10:00,"Doctor, First","Doctor, First","C-Section, Low Transverse",Repeat,1,38,,,1) None,,1) None
HOSPITAL,1/2/2000 11:00,def,Edef,"Surname2, Given2",1/2/2000 11:00,1/5/2000 11:00,"Doctor2, First2","Doctor2, First2","C-Section, Low Transverse",Primary,1,36,Ruptured Membranes;Labor;Other (see comment),"PPROM, Preterm labor",1) None,,1) None
HOSPITAL,1/3/2000 12:00,ghi,Eghi,"Surname3, Given3",1/3/2000 12:00,1/6/2000 12:00,"Doctor3, First3","Doctor3, First3","C-Section, Low Transverse",Repeat,1,31,Other (see comment),,1) None,,1) None
HOSPITAL,1/4/2000 13:00,jkl,Ejkl,"Surname4, Given4",1/4/2000 13:00,1/7/2000 13:00,,"Doctor4, First4","Vaginal, Spontaneous Delivery",,1,28,Other (see comment),Fetal anomaly,1) oxytocin (Pitocin),,

要读取数据，我尝试过：

read.csv.sql(file) 

read.csv.sql(file, filter = 'tr.exe -d ^" ')

read.csv.sql(file, filter = list('gawk -f prog', prog = '{ gsub(/"/, ""); print }'))

read.csv.sql(file, 
             filter = "perl -e 's{(\"[^\",]+),([^\"]+\")}{$_= $&, s/,/_/g, $_}eg'")

我在 Ubuntu 操作系统上使用 R Studio Server 在 R 3.0.0 中工作。

不幸的是，更改分隔符不是一种选择（对于我需要查询的某些文件也不会非常有效。我的一些文件是病理报告，所以无论我使用什么分隔符，我都会遇到这个问题。

关于我缺少什么来阅读它的任何提示？

score 1 · Accepted Answer

在 sqldf FAQ #13中尝试csvfix ，但使用write_dsv 的默认值 | 符号而不是; 因为您的文件中有分号：

read.csv.sql("myfile.csv", sep = "|", filter = "csvfix write_dsv")

r - 具有列的字段的 read.csv.sql 过滤器

1 回答 1

Related

Reference