r - 一种基于R中的频率将字符向量转换为整数的有效方法

Question

我有一个仅由“a”或“g”组成的字符向量，我想根据频率将它们转换为整数，即更频繁的一个应该编码为 0，另一个编码为 1，例如：

set.seed(17)
x = sample(c('g', 'a'), 10, replace=T)
x
# [1] "g" "a" "g" "a" "g" "a" "g" "g" "a" "g"
x[x == names(which.max(table(x)))] = 0
x[x != 0] = 1
x
# [1] "0" "1" "0" "1" "0" "1" "0" "0" "1" "0"

这行得通，但我想知道是否有更有效的方法来做到这一点。

（我们不必在这里考虑 50%-50% 的情况，因为它不应该在我们的研究中发生。）

score 3 · Accepted Answer

用这个：

ag.encode <- function(x)
{
  result <- x == "a"
  if( sum(result) > length(result) %/% 2 ) 1-result else as.numeric(result)
}

如果要将标签保留在factor结构中，请改用：

ag.encode2factor <- function(x)
{
  result <- x == "a"
  if( sum(result) > length(result) %/% 2 )
  {
     factor(2-result, labels=c("a","g"))
  }
  else
  {
     factor(result+1, labels=c("g","a"))
  }
}

score 3 · Accepted Answer

您可以将字符向量转换为factor一个。这个解决方案更通用，因为您不需要知道用于创建 x 的 2 个字符的名称。

y <- as.integer(factor(x))-1
if(sum(y)>length(y)/2) y <- as.integer(!y)

r - 一种基于R中的频率将字符向量转换为整数的有效方法

2 回答 2

Related

Reference