0

您好我有一个数据框,其中包含多个值作为某些行的列表。

var1
A8
A9
c("A1", "A1", "D3")
c("A1", "D1")
c("D1", "D1")
c("D2", "A2")
c("D5", "A1")

我试图通过保持第一次观察来“取消列出”具有多个值的行。我一直在玩 unlist 命令,但没有任何运气。完成此任务的最简单方法是什么。

4

1 回答 1

0

如评论中所示,该列必须首先使用character当前 factor类强制(转换)为类as.character

这可以通过使用参数在文件读取阶段避免stringsAsFactors=FALSE

拆分每一行并仅保留第一个值可以通过以下方式完成:

copyDF$Var1 = sapply(strsplit(copyDF$Var1,","),head,1)

让我们知道这是否有效:

#user input data with factor class
userDF = structure(list(Var1 = structure(1:6, .Label = c("", "B1", "B2", "B3", "B4", "B5", "B6", "B7", "B8", "c(\"B1\", \"B1\")", "c(\"B3\", \"B4\")", "c(\"B4\", \"B2\")"), class = "factor"), Freq = c(2538L, 633L, 458L, 328L, 135L, 56L)), .Names = c("Var1", "Freq"), row.names = c(NA, 6L), class = "data.frame")
userDF
#  Var1 Freq
#1      2538
#2   B1  633
#3   B2  458
#4   B3  328
#5   B4  135
#6   B5   56

str(userDF)
#   'data.frame':   6 obs. of  2 variables:
#$ Var1: Factor w/ 12 levels "","B1","B2","B3",..: 1 2 3 4 5 6
#$ Freq: int  2538 633 458 328 135 56

#Since userDF had no multiple values, adding them here
newDF = structure(list(Var1 = structure(1:6, .Label = c("B1,B2,B3", "B4,B5", "B6,B7,B8", "B3", "B4", "B5", "B6", "B7", "B8", "c(\"B1\", \"B1\")", "c(\"B3\", \"B4\")", "c(\"B4\", \"B2\")"), class = "factor"), Freq = c(2538L, 633L, 458L, 328L, 135L, 56L)), .Names = c("Var1", "Freq"), row.names = c(NA, 6L), class = "data.frame")
newDF
#      Var1 Freq
#1 B1,B2,B3 2538
#2    B4,B5  633
#3 B6,B7,B8  458
#4       B3  328
#5       B4  135
#6       B5   56


#Make a copy of the dataset
copyDF = newDF

#Var1 is of class factor which is not amenable for string operations,hence convert to character class
copyDF$Var1 = as.character(copyDF$Var1)

#Split each row, unlist and retain only first value

copyDF$Var1 = sapply(strsplit(copyDF$Var1,","),head,1)

copyDF
#  Var1 Freq
#1   B1 2538
#2   B4  633
#3   B6  458
#4   B3  328
#5   B4  135
#6   B5   56
于 2016-08-22T13:08:38.627 回答