如果我意识到这有多难,我会跳过这个问题。但我在pivot_long
. read_csv
将为您修复这些重复的名称,read
如果您为它们提供正确的选项,大多数其他功能也会如此。
library(tidyverse)
Raw <- readr::read_csv("Name,Time Stamp,Text,Name,Time Stamp,Text,Name,Time Stamp,Text
Arthur,1:22,No,Betty,1:23,Yes,Arthur,1:24,Dang")
# A tibble: 1 x 9
Name `Time Stamp` Text Name_1 `Time Stamp_1` Text_1 Name_2 `Time Stamp_2` Text_2
<chr> <time> <chr> <chr> <time> <chr> <chr> <time> <chr>
1 Arthur 01:22 No Betty 01:23 Yes Arthur 01:24 Dang
现在,让我们pivot_long
:
transcript1 <- Raw %>%
rowid_to_column("ROW") %>%
pivot_longer(cols = starts_with("Name"), names_to = "Nm", values_to = "NAME") %>%
pivot_longer(cols = starts_with("Time"), names_to = "Ts", values_to = "TIME.STAMP") %>%
pivot_longer(cols = starts_with("Text"), names_to = "Tx", values_to = "TEXT")
在 ROW =1 我们想将 Nm_2 与 Ts_2 与 Tx_2 等相关联。让我们构建一个函数
nmOut <- function(ROW, string){
ext <- str_extract(string, "(?<=_)\\d+")
paste(ROW,
ifelse(is.na(ext), "0", ext ))
}
transcript2 <- transcript1 %>%
mutate(rNm = nmOut(ROW, Nm),
rTs = nmOut(ROW, Ts),
rTx = nmOut(ROW, Tx)) %>%
filter(rNm == rTs & rTs == rTx)
# A tibble: 3 x 10
ROW Nm NAME Ts TIME.STAMP Tx TEXT rNm rTs rTx
<int> <chr> <chr> <chr> <time> <chr> <chr> <chr> <chr> <chr>
1 1 Name Arthur Time Stamp 01:22 Text No 1 0 1 0 1 0
2 1 Name_1 Betty Time Stamp_1 01:23 Text_1 Yes 1 1 1 1 1 1
3 1 Name_2 Arthur Time Stamp_2 01:24 Text_2 Dang 1 2 1 2 1 2
transcriptFinal <- transcript2 %>%
select(ROW, NAME, TIME.STAMP, TEXT)
transcriptFinal
# A tibble: 3 x 4
ROW NAME TIME.STAMP TEXT
<int> <chr> <time> <chr>
1 1 Arthur 01:22 No
2 1 Betty 01:23 Yes
3 1 Arthur 01:24 Dang
最后,查看 Hadley 的参考资料pivot_long
:https ://r4ds.had.co.nz/tidy-data.html#pivoting