我必须阅读 CSV(每个超过 120MB)。我使用了一个 for 循环,但它非常非常非常慢。如何更快地阅读 CSV?
我的代码:
H=data.frame()
for (i in 201:225){
for (j in 1996:2007){
filename=paste("D:/Hannah/CD/CD.R",i,"_cd",j,".csv",sep="")
x=read.csv(filename,stringsAsFactor=F)
I=c("051","041","044","54","V0262")
temp=x[(x$A_1 %in% I)|(x$A_2 %in% I)|(x$A_3 %in% I), ]
H=rbind(H,temp)
}
}
每个文件的结构都是这样的
> str(x)
'data.frame': 417691 obs. of 37 variables:
$ YM: int 199604 199612 199612 199612 199606 199606 199609 199601 ...
$ A_TYPE: int 1 1 1 1 1 1 1 1 1 1 ...
$ HOSP: chr "dd0516ed3e" "c53d67027e" ...
$ A_DATE: int 19960505 19970116 19970108 ...
$ C_TYPE: int 19 9 1 1 2 9 9 1 1 1 ...
$ S_NO : int 142 37974 4580 4579 833 6846 2272 667 447 211 ...
$ C_ITEM_1 : chr "P2" "P3" "A2"...
$ C_ITEM_2 : chr "R6" "I3" ""...
$ C_ITEM_3 : chr "W2" "" "A2"...
$ C_ITEM_4 : chr "Y1" "O3" ""...
$ F_TYPE: chr "40" "02" "02" "02" ...
$ F_DATE : int 19960415 19961223 19961227 ...
$ T_END_DATE: int NA NA NA ...
$ ID_B : int 19630526 19630526 19630526 ...
$ ID : chr "fff" "fac" "eab"...
$ CAR_NO : chr "B4" "B5" "C1" "B6" ...
$ GE_KI: int 4 4 4 4 4 4 4 4 4 4 ...
$ PT_N : chr "H10" "A10" "D10" "D10" ...
$ A_1 : chr "0521" "7948" "A310" "A312" ...
$ A_2 : chr "05235" "5354" "" "" ...
$ A_3 : chr "" "" "" "" ...
$ I_O_CE: chr "5210" "" "" "" ...
$ DR_DAY : int 0 7 3 3 0 0 3 3 3 3 ...
$ M_TYPE: int 2 0 0 0 2 2 0 0 0 0 ...
...........