3

我阅读了许多关于在 R 中拆分字符串的帖子。但是,我遇到了一个错误,我认为这是由于变量被读入 R 的方式,即在某些情况下,由于 ID 较短,因此在日期之后有空格。我正在尝试将字符变量“VESSELID”拆分为 2 个新变量:“vesselID”和“DATE”。下面是我的数据集的一个子集。

> dput(df)
structure(list(SETID = c(24153L, 24187L, 24215L, 31990L, 31990L, 
31995L, 31995L, 31995L, 31996L, 31996L, 31996L, 31997L, 31997L, 
32002L, 32002L, 32002L, 32002L, 32003L, 32003L, 32003L), VESSELID = c("6830 2002/08/13  ", 
"6830 2002/08/12  ", "6830 2002/08/15  ", "105372 2002/08/23", 
"105372 2002/08/23", "104234 2002/07/20", "104234 2002/07/20", 
"104234 2002/07/20", "104234 2002/07/21", "104234 2002/07/21", 
"104234 2002/07/21", "104234 2002/07/22", "104234 2002/07/22", 
"5744 2002/08/14  ", "5744 2002/08/14  ", "5744 2002/08/14  ", 
"5744 2002/08/14  ", "5744 2002/08/13  ", "5744 2002/08/13  ", 
"5744 2002/08/13  ")), .Names = c("SETID", "VESSELID"), row.names = c(1L, 
2L, 3L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 
21L, 22L, 23L, 24L, 25L, 26L), class = "data.frame")

我确实尝试了以下方法:

library(reshape2)
test <- data.frame(df, colsplit(df$VESSELID, split= " ",names=c("vesselID","DATE")))

但是,我收到此错误消息:

Error in colsplit(log21$VESSELID, split = " ", names = c("vesselID", "DATE")) : 
      unused argument(s) (split = " ")

split命令似乎无法正常工作。我不知道如何修复我的字符串。

4

3 回答 3

3

我实际上只是read.table在该列上使用,如下所示。假设您的数据集称为“mydata”:

mydata.new <- cbind(mydata[-2], 
                    read.table(text = as.character(mydata$VESSELID), 
                               strip.white=TRUE, header = FALSE))
names(mydata.new)[2:3] <- c("VesselID", "Date")
mydata.new
#    SETID VesselID       Date
# 1  24153     6830 2002/08/13
# 2  24187     6830 2002/08/12
# 3  24215     6830 2002/08/15
# 10 31990   105372 2002/08/23
# 11 31990   105372 2002/08/23
# 12 31995   104234 2002/07/20
# 13 31995   104234 2002/07/20
# 14 31995   104234 2002/07/20
# 15 31996   104234 2002/07/21
# 16 31996   104234 2002/07/21
# 17 31996   104234 2002/07/21
# 18 31997   104234 2002/07/22
# 19 31997   104234 2002/07/22
# 20 32002     5744 2002/08/14
# 21 32002     5744 2002/08/14
# 22 32002     5744 2002/08/14
# 23 32002     5744 2002/08/14
# 24 32003     5744 2002/08/13
# 25 32003     5744 2002/08/13
# 26 32003     5744 2002/08/13
于 2013-01-28T13:07:56.733 回答
2

参数名称不是split,它是pattern

test <- data.frame(df, colsplit(df$VESSELID, pattern = " ",names=c("vesselID","DATE")))

给出:

   SETID          VESSELID vesselID         DATE
1  24153 6830 2002/08/13       6830 2002/08/13  
2  24187 6830 2002/08/12       6830 2002/08/12  
3  24215 6830 2002/08/15       6830 2002/08/15  
10 31990 105372 2002/08/23   105372   2002/08/23
11 31990 105372 2002/08/23   105372   2002/08/23
12 31995 104234 2002/07/20   104234   2002/07/20
13 31995 104234 2002/07/20   104234   2002/07/20
14 31995 104234 2002/07/20   104234   2002/07/20
15 31996 104234 2002/07/21   104234   2002/07/21
16 31996 104234 2002/07/21   104234   2002/07/21
17 31996 104234 2002/07/21   104234   2002/07/21
18 31997 104234 2002/07/22   104234   2002/07/22
19 31997 104234 2002/07/22   104234   2002/07/22
20 32002 5744 2002/08/14       5744 2002/08/14  
21 32002 5744 2002/08/14       5744 2002/08/14  
22 32002 5744 2002/08/14       5744 2002/08/14  
23 32002 5744 2002/08/14       5744 2002/08/14  
24 32003 5744 2002/08/13       5744 2002/08/13  
25 32003 5744 2002/08/13       5744 2002/08/13  
26 32003 5744 2002/08/13       5744 2002/08/13  
于 2013-01-28T13:06:32.317 回答
1

尝试:

do.call("rbind", strsplit(VESSELID, " "))

应该返回类似:

[,1]     [,2]         [,3]    
[1,] "6830"   "2002/08/13" ""      
[2,] "6830"   "2002/08/12" ""      
[3,] "6830"   "2002/08/15" ""      
[4,] "105372" "2002/08/23" "105372"
[5,] "105372" "2002/08/23" "105372"
[6,] "104234" "2002/07/20" "104234"
[7,] "104234" "2002/07/20" "104234"
[8,] "104234" "2002/07/20" "104234"
[9,] "104234" "2002/07/21" "104234"
[10,] "104234" "2002/07/21" "104234"
[11,] "104234" "2002/07/21" "104234"
[12,] "104234" "2002/07/22" "104234"
[13,] "104234" "2002/07/22" "104234"
[14,] "5744"   "2002/08/14" ""      
[15,] "5744"   "2002/08/14" ""      
[16,] "5744"   "2002/08/14" ""      
[17,] "5744"   "2002/08/14" ""      
[18,] "5744"   "2002/08/13" ""      
[19,] "5744"   "2002/08/13" ""      
[20,] "5744"   "2002/08/13" "" 

从那里拿走你需要的东西

于 2013-01-28T13:26:20.460 回答