r - `vec_arith` 未按预期调用

Question

我在下面放置了一个简单的案例，我在双对象上定义了一个类“foo”，我希望任何涉及此类对象的算术运算都将其从其“foo”类中剥离并正常进行。

我可以部分让它工作，但不是很强大。见下文：

library(vctrs)

x <- new_vctr(42, class = "foo")

# then this won't work (expected)
x * 2
#> Error: <foo> * <double> is not permitted

# define vec_arith method
vec_arith.foo <- function(op, x, y, ...) {
  print("we went there")
  # wrap x in vec_data to strip off the class, and forward to `vec_arith_base`
  vec_arith_base(op, vec_data(x), y)
}

# now this works  
x * 2
#> [1] "we went there"
#> [1] 84

# but this doesn't, and doesn't go through vec_arith.foo
x * data.frame(a=1)
#> Warning: Incompatible methods ("*.vctrs_vctr", "Ops.data.frame") for "*"
#> Error in x * data.frame(a = 1): non-numeric argument to binary operator

# while this works
42 * data.frame(a=1)
#>    a
#> 1 42

我怎样才能使x * data.frame(a=1)退货与42 * data.frame(a=1)

traceback()不返回任何东西，所以我不确定如何调试它。

score 1 · Accepted Answer

这是一个有趣的问题，引起了我的兴趣。我不是这个问题的专家，但我找到了一种让它工作的方法。这是一个相当肮脏的解决方法，没有真正的解决方案。使用 {vctrs} 包应该有更好的方法来解决这个问题。

*问题很复杂，因为我们正在处理使用双重分派的内部泛型（参见此处）。重要的部分是：

Ops 组中的泛型，包括两个参数的算术和布尔运算符，如 - 和 &，实现了一种特殊类型的方法调度。它们根据两个参数的类型进行分派，这称为双重分派。

事实证明，对于像x * yR 这样的调用，会同时查找 this call 和y * x. 那么有三种可能的结果：

方法是相同的，所以使用哪种方法并不重要。

方法不同，R 回退到带有警告的内部方法。

一种方法是内部的，在这种情况下，R 调用另一种方法。

让我们在查看问题时牢记这一点。我首先避免使用 {vctrs} 包并尝试以两种方式重建问题。首先，我尝试将新类的对象与列表相乘。这重现了原始示例中的错误：

# lets create a new object
x1 <- 10
class(x1) <- "myclass"

# and multiply it with a list
l <- list(1)    
x1 * l 

# same error as in orignal example, but without warning
#> Error in x1 * l: non-numeric argument to binary operator

sloop::s3_dispatch(x1 * l)
#>    *.myclass
#>    *.default
#>    Ops.myclass
#>    Ops.default
#> => * (internal)

sloop::s3_dispatch(l * x1)
#>    *.list
#>    *.default
#>    Ops.list
#>    Ops.default
#> => * (internal)

我们可以通过 {sloop} 包看到调用了内部泛型。对于这个泛型，没有办法*在列表上使用。因此，让我们尝试是否可以覆盖此方法：

`*.myclass` <- function(x, y) {
  print("myclass")
  if (is.list(y)) {
    print("if clause")
    y <- unlist(y)
  } else {
    print("didn't use if clause")
  }
  
    x + y # to see if it's working the operation is changed
}

x1 * l # now working
#> [1] "myclass"
#> [1] "if clause"
#> [1] 11
#> attr(,"class")
#> [1] "myclass"

sloop::s3_dispatch(x1 * l)
#> => *.myclass
#>    *.default
#>    Ops.myclass
#>    Ops.default
#>  * * (internal)

sloop::s3_dispatch(l * x1)
#>    *.list
#>    *.default
#>    Ops.list
#>    Ops.default
#> => * (internal)

这很有效（尽管我们真的不应该改变方法调用中的对象）。这里我们现在有上面描述的第三种情况：方法不同，一种是内部的，所以调用了非内部的方法。与data.frame's 不同，list's 没有用于算术运算的现有方法。所以我们需要一个例子，其中两个具有不同方法的不同类的对象相乘。

# another object
y1 <- 20
class(y1) <- "another_class"

# here we still only have one method `*.myclass`:
x1 * y1 # working
#> [1] "myclass"
#> [1] "didn't use if clause"
#> [1] 30
#> attr(,"class")
#> [1] "myclass"

sloop::s3_dispatch(x1 * y1)
#> => *.myclass
#>    *.default
#>    Ops.myclass
#>    Ops.default
#>  * * (internal)

sloop::s3_dispatch(y1 * x1)
#>    *.another_class
#>    *.default
#>    Ops.another_class
#>    Ops.default
#> => * (internal)

# lets introduce another method:    
`*.another_class` <- function(x, y) {
  x - y # again, to see if it is working we change the operation
}

# now we get (only) a warning, but with a different result!
x1 * y1 
#> Warning: Incompatible methods ("*.myclass", "*.another_class") for "*"
#> [1] 200
#> attr(,"class")
#> [1] "myclass"

sloop::s3_dispatch(x1 * y1)
#> => *.myclass
#>    *.default
#>    Ops.myclass
#>    Ops.default
#>  * * (internal)

sloop::s3_dispatch(y1 * x1)
#> => *.another_class
#>    *.default
#>    Ops.another_class
#>    Ops.default
#>  * * (internal)

这里我们现在有上面描述的第二种情况：两种方法不同，R 回退到内部方法并发出警告。这会产生“未改变”的结果20 * 10 = 200。

所以关于原来的问题，我的理解是我们有两个相互冲突的方法“*.vctrs_vctr”和“Ops.data.frame”。由于这个原因，内部方法* (internal)被调用，而这个内部方法不允许lists或data.frames（这通常在内部完成Ops.data.frame，不使用，因为方法冲突）。

library(vctrs)

z <- new_vctr(42, class = "foo")
a <- data.frame(a = 1)

z * a
#> Warning: Incompatible methods ("*.vctrs_vctr", "Ops.data.frame") for "*"
#> Error in z * a: non-numeric argument to binary operator

sloop::s3_dispatch(z * a)
#>    *.foo
#> => *.vctrs_vctr
#>    *.default
#>    Ops.foo
#>    Ops.vctrs_vctr
#>    Ops.default
#>  * * (internal)

sloop::s3_dispatch(a * z)
#>    *.data.frame
#>    *.default 
#> => Ops.data.frame
#>    Ops.default
#>  * * (internal)

在这里，我们可以再次看到存在两种不同的方法，因此使用了内部方法。

我想出的肮脏的解决方法是：

创建一个非内部泛型*
明确定义*.foo和
明确定义*.numeric一旦对象“未分类”时将调用vec_data().

`*` <- function(x, y) {
  UseMethod("*")
}

`*.foo` <- function(x, y) {
  op_fn <- getExportedValue("base", "*")
  op_fn(vec_data(x),vec_data(y))
}

`*.numeric` <- function(x, y) {
  print("numeric")
  fn <- getExportedValue("base", "*")
  fn(x, y)
}

z * a
#> [1] "numeric"
#>    a
#> 1 42

sloop::s3_dispatch(z * a)
#> => *.foo
#>  * *.vctrs_vctr
#>    *.default
#>    Ops.foo
#>    Ops.vctrs_vctr
#>    Ops.default
#>  * * (internal)

sloop::s3_dispatch(a * z)
#>    *.data.frame
#>    *.default
#> => Ops.data.frame
#>    Ops.default
#>  * * (internal)

^{由reprex 包（v0.3.0）于 2021-01-13 创建}

不幸的是，我不是 100% 确定发生了什么。似乎覆盖了*泛型，也覆盖了 R 处理此泛型的双重调度的方式。让我们回顾一下x1 * y1上面两种不同类型对象的乘法。早些时候，这两种方法都被调用了，由于它们不同，因此发出了警告并选择了内部方法。现在我们观察以下内容：

x1 * y1 # working without warning
#> [1] "myclass"
#> [1] "didn't use if clause"
#> [1] 30
#> attr(,"class")
#> [1] "myclass"

sloop::s3_dispatch(x1 * y1)
#> => *.myclass
#>    *.default
#>    Ops.myclass
#>    Ops.default
#>  * * (internal)

sloop::s3_dispatch(y1 * x1)
#> => *.another_class
#>    *.default
#>    Ops.another_class
#>    Ops.default
#>  * * (internal)

我们有两个冲突的方法，但 R 仍然选择了第一个对象的方法，没有发出警告。

这当然不是问题的真正解决方案，原因有很多：

覆盖算术运算的泛型似乎不是一个好主意，因为它可能会破坏代码。
我们还需要处理data.frame(a = 1) * z哪些仍然不起作用（这里我们需要覆盖Ops.data.frame.
我们不需要为每个算术运算编写方法。

{vctrs} 包应该可以帮助我们找到更简单、更安全的解决方案，而且它可能已经存在。在 Github 上打开一个问题可能是值得的。

r - `vec_arith` 未按预期调用

1 回答 1

Related

Reference