0

我的数据集有 34,000 行和 353 列。一列是位置,它有 11,000 个唯一值。我想在 for 循环中对数据集进行子集化。我可以通过为每个子集创建一个新数据框来做到这一点,但我希望这些子集形成一个数据框。我在下面包含了一个示例数据集

structure(list(X = structure(c(1L, 1L, 1L, 1L, 3L, 3L, 3L, 2L, 
3L), .Label = c("Car", "DOG", "House"), class = "factor"), Y = c(20L, 
20L, 20L, 20L, 410L, 410L, 410L, 410L, 60L), Z = structure(c(1L, 
3L, 8L, 1L, 7L, 5L, 2L, 4L, 6L), .Label = c("ARGENTINA", "BERLIN GERMANY", 
"BUENOS AIRES ARGENTINA", "DUBLIN IRELAND", "FROM AUSTRIA", "GERMANY", 
"IN TRANSIT FROM GERMANY", "RIVER PLATE ARGENTINA"), class = "factor"), 
K = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "A", class = "factor")),
.Names = c("X", "Y", "Z", "K"), class = "data.frame", row.names = c(NA, -9L))

我可以使用以下代码创建新的数据框

 l=c("ARGENTINA","IRELAND")
for(i in l){
     assign(paste("newdata",i,sep=""),
     subset(TESTL[which(grepl(i,TESTL$Z)&
     !grepl("IN TRANSIT",TESTL$Z)&!grepl("FROM",TESTL$Z)),],
      select=c("X","Y","Z")))}

但是我想创建一个新的数据框来保存所有子集。我试过下面的代码

d<-data.frame()
for(i in l){d<-rbind(d,c(
subset(TESTL[which(grepl(i,TESTL$Z) & !grepl("IN TRANSIT",TESTL$Z)
& !grepl("FROM",TESTL$Z)),],
    select=c("X","Y","Z")))}

我收到以下错误

Warning messages:
1: In `[<-.factor`(`*tmp*`, ri, value = "DOG") :
invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, ri, value = "DUBLIN IRELAND") :
invalid factor level, NA generated

我试图将这些因素转换为字符,但没有成功。任何帮助表示赞赏

4

2 回答 2

0

我认为您在assign此处使用并尝试将子集存储在单独的数据框中会使您的生活变得相当困难。尝试更多类似的东西:

l <- c("ARGENTINA","IRELAND")
res <- setNames(vector("list",length(l)),l)

for (i in seq_along(l)){                 
    res[[i]] <- dat[grepl(l[i],dat$Z) & !grepl("IN TRANSIT",dat$Z) & !grepl("FROM",dat$Z),c("X","Y","Z")]
}

> res
$ARGENTINA
    X  Y                      Z
1 Car 20              ARGENTINA
2 Car 20 BUENOS AIRES ARGENTINA
3 Car 20  RIVER PLATE ARGENTINA
4 Car 20              ARGENTINA

$IRELAND
    X   Y              Z
8 DOG 410 DUBLIN IRELAND


> do.call("rbind",res)
              X   Y                      Z
ARGENTINA.1 Car  20              ARGENTINA
ARGENTINA.2 Car  20 BUENOS AIRES ARGENTINA
ARGENTINA.3 Car  20  RIVER PLATE ARGENTINA
ARGENTINA.4 Car  20              ARGENTINA
IRELAND     DOG 410         DUBLIN IRELAND
于 2014-05-29T16:14:13.137 回答
0

警告是因为在循环的第一次迭代(阿根廷)它引入了因子变量 X 和 Z,并在第二次引入了具有另一个因子水平的 IRELAND。所以:

首先,您应该更改您的变量 n 的类TESTL

for (i in names(TESTL) [grep ("factor", sapply (TESTL, class))]) {
  TESTL[[i]] <- as.character (TESTL[[i]])
 }

然后它将使用下一个代码:

d <- data.frame(stringsAsFactors=F)
for(i in l){d <- rbind(d,
        TESTL [grepl(i,TESTL$Z) & !grepl("FROM|IN TRANSIT", TESTL$Z), c("X", "Y", "Z")])}
于 2014-05-29T16:29:03.720 回答