我有一个大数据集,其中一个小样本看起来像下面的 4 x 5 小标题。我正在尝试使用如下变量将多个分隔列拆分为唯一行c=="Split"
:
library(splitstackshape)
dt <- tibble(
a = c("Quartz | White Spirit | Wildfire", "Quiet Riot", "Race Against Time", "Down | Heart Lane | X | Breaking H"),
b = c("Muthas Pride", "Killer Girls / Slick Black Cadillac", "Demo 1980", "Life 55"),
c = c("Split", "Single", "Demo", "Split"),
d = c("Birmingham, England | Hartlepool, England | Sheffield, South Yorkshire, England", "Los Angeles, California", "Nottingham, England", "Liverpool | Beijing | | NYC"),
e = c("wf | ef | ff", "g", "f", "cf | af | df | rf")
)
dt.s <- subset(dt, c == "Split")
dt.split <- cSplit(dt.s, c("a", "d", "e"), c("|", "|", "|"), "long")
dt.split
但是,这会强制增加一行 NA,如第 4 行所示:
a b c d e
1: Quartz Muthas Pride Split Birmingham, England wf
2: White Spirit Muthas Pride Split Hartlepool, England ef
3: Wildfire Muthas Pride Split Sheffield, South Yorkshire, England ff
4: NA Muthas Pride Split NA NA
5: Down Life 55 Split Liverpool cf
6: Heart Lane Life 55 Split Beijing af
7: X Life 55 Split df
8: Breaking H Life 55 Split NYC rf
如果我只拆分两列,这不是问题。如何让它不产生 NA 行?而且,有没有办法在不设置cSplit
子集的情况下进行工作c
?