您好我有一个数据框,其中包含多个值作为某些行的列表。
var1
A8
A9
c("A1", "A1", "D3")
c("A1", "D1")
c("D1", "D1")
c("D2", "A2")
c("D5", "A1")
我试图通过保持第一次观察来“取消列出”具有多个值的行。我一直在玩 unlist 命令,但没有任何运气。完成此任务的最简单方法是什么。
您好我有一个数据框,其中包含多个值作为某些行的列表。
var1
A8
A9
c("A1", "A1", "D3")
c("A1", "D1")
c("D1", "D1")
c("D2", "A2")
c("D5", "A1")
我试图通过保持第一次观察来“取消列出”具有多个值的行。我一直在玩 unlist 命令,但没有任何运气。完成此任务的最简单方法是什么。
如评论中所示,该列必须首先使用character
当前
factor
类强制(转换)为类as.character
。
这可以通过使用参数在文件读取阶段避免stringsAsFactors=FALSE
拆分每一行并仅保留第一个值可以通过以下方式完成:
copyDF$Var1 = sapply(strsplit(copyDF$Var1,","),head,1)
让我们知道这是否有效:
#user input data with factor class
userDF = structure(list(Var1 = structure(1:6, .Label = c("", "B1", "B2", "B3", "B4", "B5", "B6", "B7", "B8", "c(\"B1\", \"B1\")", "c(\"B3\", \"B4\")", "c(\"B4\", \"B2\")"), class = "factor"), Freq = c(2538L, 633L, 458L, 328L, 135L, 56L)), .Names = c("Var1", "Freq"), row.names = c(NA, 6L), class = "data.frame")
userDF
# Var1 Freq
#1 2538
#2 B1 633
#3 B2 458
#4 B3 328
#5 B4 135
#6 B5 56
str(userDF)
# 'data.frame': 6 obs. of 2 variables:
#$ Var1: Factor w/ 12 levels "","B1","B2","B3",..: 1 2 3 4 5 6
#$ Freq: int 2538 633 458 328 135 56
#Since userDF had no multiple values, adding them here
newDF = structure(list(Var1 = structure(1:6, .Label = c("B1,B2,B3", "B4,B5", "B6,B7,B8", "B3", "B4", "B5", "B6", "B7", "B8", "c(\"B1\", \"B1\")", "c(\"B3\", \"B4\")", "c(\"B4\", \"B2\")"), class = "factor"), Freq = c(2538L, 633L, 458L, 328L, 135L, 56L)), .Names = c("Var1", "Freq"), row.names = c(NA, 6L), class = "data.frame")
newDF
# Var1 Freq
#1 B1,B2,B3 2538
#2 B4,B5 633
#3 B6,B7,B8 458
#4 B3 328
#5 B4 135
#6 B5 56
#Make a copy of the dataset
copyDF = newDF
#Var1 is of class factor which is not amenable for string operations,hence convert to character class
copyDF$Var1 = as.character(copyDF$Var1)
#Split each row, unlist and retain only first value
copyDF$Var1 = sapply(strsplit(copyDF$Var1,","),head,1)
copyDF
# Var1 Freq
#1 B1 2538
#2 B4 633
#3 B6 458
#4 B3 328
#5 B4 135
#6 B5 56