r - 矢量化列选择

Question

How can I use one column's value (eg, xbelow) to select among values among possible columns, when the selection is specific to each row?

x变量确定是否应为给定行选择变量a、b或。c这是一个简化的示例；真正的单元格不是列名和行号的串联。

library(magrittr); requireNamespace("tibble"); requireNamespace("dplyr")

ds <- tibble::tibble(
  x   = c(  1 ,   1 ,   2 ,   3 ,   1 ),
  a   = c("a1", "a2", "a3", "a4", "a5"),
  b   = c("b1", "b2", "b3", "b4", "b5"),
  c   = c("c1", "c2", "c3", "c4", "c5")
)

所需的列是值：

# ds$y_desired      <- c("a1", "a2", "b3", "c4", "a5")
# ds$column_desired <- c("a" , "a" , "b" , "c" , "a" )

当然，以下不会产生一列，而是五列。

ds[, ds$column_desired]

以下产生错误： Error in mutate_impl(.data, dots) : basic_string::_M_replace_aux.

ds %>% 
  dplyr::rowwise() %>% 
  dplyr::mutate(
    y = .[[column_desired]]
  ) %>% 
  dplyr::ungroup()

如果我的真实场景只有两个或三个选择，我可能会使用嵌套如果，但我想要一种通用映射方法来适应更多的条件。

ds %>% 
  dplyr::mutate(
    y_if_chain = ifelse(x==1, a, ifelse(x==2, b, c))
  )

理想情况下，该方法可以通过查找表或其他一些元数据对象来指导，例如：

ds_lookup <- tibble::tribble(
  ~x,    ~desired_column,
  1L,                "a",
  2L,                "b",
  3L,                "c"
)

我敢肯定之前有人问过这个列切换问题，但我没有找到适用的问题。

我更喜欢tidyverse解决方案（b/c 这是我的团队最喜欢的），但我对任何工具都持开放态度。我不知道如何结合使用apply和kimisc::vswitch。

score 1 · Accepted Answer

1

尝试这个：

ds$y_desired = apply(ds, 1, function(r) r[as.integer(r[1])+1])

于 2016-12-11T07:33:44.827 回答

score 1 · Accepted Answer

我认为问题在于您的数据格式不符合您的需要。首先，我会从宽格式转换为长格式tidyr::gather()：

library("tidyr")
ds %>% 
  gather(y, col, a:c)

# A tibble: 15 × 3
#        x     y   col
#    <dbl> <chr> <chr>
# 1      1     a    a1
# 2      1     a    a2
# 3      2     a    a3
# 4      3     a    a4
# 5      1     a    a5
# 6      1     b    b1
# 7      1     b    b2
# 8      2     b    b3
# 9      3     b    b4
# 10     1     b    b5
# 11     1     c    c1
# 12     1     c    c2
# 13     2     c    c3
# 14     3     c    c4
# 15     1     c    c5

然后任务变得像filter在你需要的条件下一样琐碎（例如x == 1, y == a，等）

score 1 · Accepted Answer

感谢@sirallen 和@Phil 向我展示了一个更好的方法。这是我最终使用的，如果它对未来的任何人有帮助的话。它被概括以适应

列的任意位置，
的任意值x，和
元数据表将值映射x到所需的列（即 , a, b& c）。

给定的观察数据集和查找数据集：

ds <- tibble::tibble(
  x   = c( 10 ,  10 ,  20 ,  30 ,  10 ),
  a   = c("a1", "a2", "a3", "a4", "a5"),
  b   = c("b1", "b2", "b3", "b4", "b5"),
  c   = c("c1", "c2", "c3", "c4", "c5")
)

ds_lookup <- tibble::tribble(
  ~x ,    ~desired_column,
  10L,                "a",
  20L,                "b",
  30L,                "c"
)

r封装字符向量和查找表之间的映射。

determine_y <- function( r ) {
  # browser()
  lookup_row_index <- match(r['x'], ds_lookup$x)
  column_name      <- ds_lookup$desired_column[lookup_row_index]
  r[column_name]
}

ds$y <- apply(ds, 1, function(r) determine_y(r))

score 0 · Accepted Answer

在从@sirallen 的回答中学习后，我重读了 Hadley关于泛函的章节。以下是switch与 apply 系列的其他成员一起使用的解决方案，包括 Tidyverse 样式的链接。

library(magrittr); requireNamespace("purrr"); requireNamespace("tibble"); requireNamespace("dplyr")

ds <- tibble::tibble(
  x   = c( 10 ,  10 ,  20 ,  30 ,  10 ),
  a   = c("a1", "a2", "a3", "a4", "a5"),
  b   = c("b1", "b2", "b3", "b4", "b5"),
  c   = c("c1", "c2", "c3", "c4", "c5")
)
determine_2 <- function( ss, a, b, c) {
  switch(
    as.character(ss),
    "10"    =   a,
    "20"    =   b,
    "30"    =   c
  )
}

# Each of these calls returns a vector.
unlist(Map(        determine_2, ds$x, ds$a, ds$b, ds$c))
mapply(            determine_2, ds$x, ds$a, ds$b, ds$c)
parallel::mcmapply(determine_2, ds$x, ds$a, ds$b, ds$c)                 # For Linux
unlist(purrr::pmap(list(        ds$x, ds$a, ds$b, ds$c), determine_2))

# Returns a dataset with the new variable.
ds %>%
  dplyr::mutate(
    y = unlist(purrr::pmap(list(x, a, b, c), determine_2))
  )

r - 矢量化列选择

4 回答 4

Related

Reference