r - 用少量点进行残差计算（最多 20 个）

Question

我loess用来计算残差。我期望以下（小系列）找到第三点残差的大值

    y <- c(5814, 6083, 17764, 6110, 6556)
    x <- c(14564, 14719, 14753, 14754, 15086)
    > residuals(loess(y ~ x))
            1             2             3             4             5 
 2.728484e-12 -9.094947e-13  3.637979e-12  3.637979e-12  0.000000e+00

特别是， loess给出以下输出：

> loess(y ~ x)
Call:
loess(formula = y ~ x)

Number of Observations: 5 
Equivalent Number of Parameters: 5 
Residual Standard Error: Inf 
Warning messages:
1: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize,  :
  span too small.   fewer data values than degrees of freedom.
2: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize,  :
  pseudoinverse used at 14561
3: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize,  :
  neighborhood radius 191.61
4: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize,  :
  reciprocal condition number  0
5: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize,  :
  There are other near singularities as well. 1.1263e+005

我现在失踪可能有一个（非常简单的）原因，但以上对我来说似乎很奇怪......为什么它在我的情况下“不起作用”？

编辑：

感谢@Gavin Simpson，他向我建议了这个链接，我在包中发现了MASS这个函数rlm，它给出了我所希望的。同时，我也尝试使用lowess几次迭代，它的拟合值实际上收敛得“更好”（在这个案例）到我的数据：

library(MASS)
method_rlm <- rlm(x=x,y=y)
method_lowess <- lowess(x,y, iter=7, f=1)

df<-data.frame(x=x, y=y, rlm=method_rlm$fitted.values, lowess=method_lowess$y)

library(ggplot2)
ggplot(df) +
  geom_line(aes(x, y), color="red") +
  geom_line(aes(x, rlm), color="blue") +
  geom_line(aes(x, lowess), color="green") +
  geom_point(aes(x, y), color="red")

在此处输入图像描述

我也看了一些时间，差异很大。

> microbenchmark(rlm(x=x,y=y), lowess(x,y, iter=7, f=1), times=1000)
Unit: microseconds
                          expr      min       lq    median        uq        max neval
             rlm(x = x, y = y) 6445.269 6663.972 6906.1350 9417.1895 271494.006  1000
 lowess(x, y, iter = 7, f = 1)  169.099  193.046  238.0085  273.9295   3900.493  1000

你认为这种差异值得吗？我有一百万个这样的小系列（最多 5 到 20 个点和类似类型的异常值）

score 3 · Accepted Answer

数据中有 5 个观察值，并且loess()正在拟合具有 5 个自由度的模型，因此它能够完美地拟合观察到的数据，因此可以得到小的（实际上是 0）残差。loess()有足够的自由度来精确地插入您的数据，但不是有用的数据摘要。拟合一个更简单的模型。

r - 用少量点进行残差计算（最多 20 个）

1 回答 1

Related

Reference