regex - 拆分 camelCase 列名

Question

我一直试图弄清楚这一点，并想我会在这里问。

假设我有一个如下数据框：

df <- data.frame(participant = 1:6, group = c("adult", "adult", "child", "child", "NSS", "NSS"), RegProto = c(2, 3, 4, 2, 4, 3), RegInt = c(2, 3, 4, 6, 6, 5), RegDistant = c(3, 3, 4, 5, 4, 5), IrregProto = c(4, 5, 3, 4, 3, 1), IrregInt = c(4, 4, 4, 4, 4, 4), IrregDistant = c(4, 5, 6, 8, 9, 1))

这个数据框的问题在于每个都包含两个变量：一个变量的值为Reg或Irreg，另一个变量的值为Proto、Int或Distant。我想做的是拆分这些列并使表格变长，最好使用tidyr. 我以为我可以这样做。

library("tidyr")
df_long <- df %>%
gather(index, n, -group, -participant) %>%
select(participant, group, index, n) %>%
separate(index, into = c("verb", "similarity"), sep = "\\.?=\\p{Upper}")

这就是我想要的，直到separate(). 我收到一条错误消息，指出这些值没有被拆分，但没有其他关于为什么会这样的建议。我是正则表达式的新手，所以我怀疑问题一定存在，但我无法弄清楚正确的语法可能是什么。

score 9 · Accepted Answer

You can use this regex:

(?<=.)(?=[A-Z])

This indicates the (zero-length) position followed by an uppercase letter and preceded by any character.

The command:

library(dplyr)
df %>%
  gather(index, n, -group, -participant) %>%
  select(participant, group, index, n) %>%
  separate(index, into = c("verb", "similarity"), sep = "(?<=.)(?=[A-Z])")

regex - 拆分 camelCase 列名

1 回答 1

Related

Reference