1

我有一个带有以下示例值的数据框。

[1] "entry.cei"                                                                               
[2] "entry.lifecycle->hist.open.personal demand chequing account->exit.lifecycle->entry.cei"  
[3] "entry.lifecycle->hist.open.personal demand savings account->exit.lifecycle->entry.cei"   
[4] "entry.transaction->txn.no source available->exit.transaction->entry.cei"                 
[5] "entry.branch->exit.branch->entry.transaction->txn.in-branch->exit.transaction->entry.cei"

我需要通过“->”将它们拆分,将它们放在不同的列中,比如 V1、V2 等。例如:

           V1                             V2               V3             V4           V5     V6    V7
1   entry.cei   
2   entry.lifecycle hist.open.personal demand chequing account  exit.lifecycle  entry.cei   
3   entry.lifecycle hist.open.personal demand savings account   exit.lifecycle  entry.cei   

我怎样才能在 R 中实现这一点?我尝试将 rbind 与 strsplit() 一起使用,但我认为它需要相同数量的列。

4

1 回答 1

1

最简单的方法是使用逗号gsub替换->,然后使用read.csv. 如果数据中有逗号,那么只需使用>而不是逗号,它应该没问题。

read.csv(text = gsub("->", ",", x, fixed = TRUE), header = FALSE)
#                  V1                                         V2                V3            V4               V5        V6
# 1         entry.cei                                                                                                      
# 2   entry.lifecycle hist.open.personal demand chequing account    exit.lifecycle     entry.cei                           
# 3   entry.lifecycle  hist.open.personal demand savings account    exit.lifecycle     entry.cei                           
# 4 entry.transaction                    txn.no source available  exit.transaction     entry.cei                           
# 5      entry.branch                                exit.branch entry.transaction txn.in-branch exit.transaction entry.cei

或者,

read.table(text = gsub("->", ",", x, fixed = TRUE), sep = ",", fill = TRUE)

您仍然可以使用rbindand strsplit只要您首先使所有列表元素的长度相同。length<-替换功能可以帮助解决这个问题。

s <- strsplit(x, "->", fixed = TRUE)
data.frame(do.call(rbind, lapply(s, `length<-`, max(sapply(s, length)))))
#                  X1                                         X2                X3            X4               X5        X6
# 1         entry.cei                                       <NA>              <NA>          <NA>             <NA>      <NA>
# 2   entry.lifecycle hist.open.personal demand chequing account    exit.lifecycle     entry.cei             <NA>      <NA>
# 3   entry.lifecycle  hist.open.personal demand savings account    exit.lifecycle     entry.cei             <NA>      <NA>
# 4 entry.transaction                    txn.no source available  exit.transaction     entry.cei             <NA>      <NA>
# 5      entry.branch                                exit.branch entry.transaction txn.in-branch exit.transaction entry.cei

原始x向量在哪里

x <- c("entry.cei", 
 "entry.lifecycle->hist.open.personal demand chequing account->exit.lifecycle->entry.cei", 
 "entry.lifecycle->hist.open.personal demand savings account->exit.lifecycle->entry.cei", 
 "entry.transaction->txn.no source available->exit.transaction->entry.cei", 
 "entry.branch->exit.branch->entry.transaction->txn.in-branch->exit.transaction->entry.cei")
于 2014-10-15T06:26:55.820 回答