我loess
用来计算残差。我期望以下(小系列)找到第三点残差的大值
y <- c(5814, 6083, 17764, 6110, 6556)
x <- c(14564, 14719, 14753, 14754, 15086)
> residuals(loess(y ~ x))
1 2 3 4 5
2.728484e-12 -9.094947e-13 3.637979e-12 3.637979e-12 0.000000e+00
特别是, loess
给出以下输出:
> loess(y ~ x)
Call:
loess(formula = y ~ x)
Number of Observations: 5
Equivalent Number of Parameters: 5
Residual Standard Error: Inf
Warning messages:
1: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize, :
span too small. fewer data values than degrees of freedom.
2: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize, :
pseudoinverse used at 14561
3: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize, :
neighborhood radius 191.61
4: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize, :
reciprocal condition number 0
5: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize, :
There are other near singularities as well. 1.1263e+005
我现在失踪可能有一个(非常简单的)原因,但以上对我来说似乎很奇怪......为什么它在我的情况下“不起作用”?
编辑:
感谢@Gavin Simpson,他向我建议了这个链接,我在包中发现了MASS
这个函数rlm
,它给出了我所希望的。同时,我也尝试使用lowess
几次迭代,它的拟合值实际上收敛得“更好”(在这个案例)到我的数据:
library(MASS)
method_rlm <- rlm(x=x,y=y)
method_lowess <- lowess(x,y, iter=7, f=1)
df<-data.frame(x=x, y=y, rlm=method_rlm$fitted.values, lowess=method_lowess$y)
library(ggplot2)
ggplot(df) +
geom_line(aes(x, y), color="red") +
geom_line(aes(x, rlm), color="blue") +
geom_line(aes(x, lowess), color="green") +
geom_point(aes(x, y), color="red")
我也看了一些时间,差异很大。
> microbenchmark(rlm(x=x,y=y), lowess(x,y, iter=7, f=1), times=1000)
Unit: microseconds
expr min lq median uq max neval
rlm(x = x, y = y) 6445.269 6663.972 6906.1350 9417.1895 271494.006 1000
lowess(x, y, iter = 7, f = 1) 169.099 193.046 238.0085 273.9295 3900.493 1000
你认为这种差异值得吗?我有一百万个这样的小系列(最多 5 到 20 个点和类似类型的异常值)