r - 如何在 R 中为我的数据拟合平滑曲线？

Question

我正在尝试在R. 我有以下简单的玩具数据：

> x
 [1]  1  2  3  4  5  6  7  8  9 10
> y
 [1]  2  4  6  8  7 12 14 16 18 20

现在，当我使用标准命令绘制它时，它看起来很颠簸和前卫，当然：

> plot(x,y, type='l', lwd=2, col='red')

如何使曲线平滑，以便使用估计值对 3 个边缘进行舍入？我知道有很多方法可以拟合平滑曲线，但我不确定哪种方法最适合这种类型的曲线以及如何将其写入R.

score 111 · Accepted Answer

我非常喜欢loess()平滑：

x <- 1:10
y <- c(2,4,6,8,7,12,14,16,18,20)
lo <- loess(y~x)
plot(x,y)
lines(predict(lo), col='red', lwd=2)

Venables 和 Ripley 的 MASS 书中有一整节有关平滑的内容，其中还包括样条曲线和多项式——但loess()几乎是每个人的最爱。

score 63 · Accepted Answer

也许 smooth.spline 是一个选项，您可以在此处设置平滑参数（通常在 0 和 1 之间）

smoothingSpline = smooth.spline(x, y, spar=0.35)
plot(x,y)
lines(smoothingSpline)

您还可以在 smooth.spline 对象上使用 predict 。该函数带有基础 R，请参阅 ?smooth.spline 了解详细信息。

score 28 · Accepted Answer

为了让它真正顺利...

x <- 1:10
y <- c(2,4,6,8,7,8,14,16,18,20)
lo <- loess(y~x)
plot(x,y)
xl <- seq(min(x),max(x), (max(x) - min(x))/1000)
lines(xl, predict(lo,xl), col='red', lwd=2)

这种风格插入了很多额外的点，让你得到一个非常平滑的曲线。这似乎也是 ggplot 采用的方法。如果标准的平滑度很好，你可以使用。

scatter.smooth(x, y)

score 27 · Accepted Answer

ggplot2 包中的qplot()函数非常易于使用，并提供了一个包含置信带的优雅解决方案。例如，

qplot(x,y, geom='smooth', span =0.5)

生产在此处输入图像描述

score 15 · Accepted Answer

正如德克所说，黄土是一种非常好的方法。

另一种选择是使用贝塞尔样条曲线，如果您没有很多数据点，它在某些情况下可能比 LOESS 效果更好。

在这里你会找到一个例子：http ://rosettacode.org/wiki/Cubic_bezier_curves#R

# x, y: the x and y coordinates of the hull points
# n: the number of points in the curve.
bezierCurve <- function(x, y, n=10)
    {
    outx <- NULL
    outy <- NULL

    i <- 1
    for (t in seq(0, 1, length.out=n))
        {
        b <- bez(x, y, t)
        outx[i] <- b$x
        outy[i] <- b$y

        i <- i+1
        }

    return (list(x=outx, y=outy))
    }

bez <- function(x, y, t)
    {
    outx <- 0
    outy <- 0
    n <- length(x)-1
    for (i in 0:n)
        {
        outx <- outx + choose(n, i)*((1-t)^(n-i))*t^i*x[i+1]
        outy <- outy + choose(n, i)*((1-t)^(n-i))*t^i*y[i+1]
        }

    return (list(x=outx, y=outy))
    }

# Example usage
x <- c(4,6,4,5,6,7)
y <- 1:6
plot(x, y, "o", pch=20)
points(bezierCurve(x,y,20), type="l", col="red")

score 13 · Accepted Answer

其他答案都是很好的方法。但是，R 中还有一些其他选项未被提及，包括lowessand approx，它们可能会提供更好的拟合或更快的性能。

使用备用数据集更容易证明这些优势：

sigmoid <- function(x)
{
  y<-1/(1+exp(-.15*(x-100)))
  return(y)
}

dat<-data.frame(x=rnorm(5000)*30+100)
dat$y<-as.numeric(as.logical(round(sigmoid(dat$x)+rnorm(5000)*.3,0)))

这是与生成它的 sigmoid 曲线叠加的数据：

在查看人群中的二元行为时，这种数据很常见。例如，这可能是客户是否购买某物（y 轴上的二进制 1/0）与他们在网站上花费的时间（x 轴）的图。

大量的点用于更好地展示这些功能的性能差异。

Smooth, spline, 并且smooth.spline都在像这样的数据集上使用我尝试过的任何参数集产生乱码，这可能是因为它们倾向于映射到每个点，这不适用于嘈杂的数据。

、和函数都产生有用的结果loess，虽然只是勉强. 这是每个使用轻微优化参数的代码：lowessapproxapprox

loessFit <- loess(y~x, dat, span = 0.6)
loessFit <- data.frame(x=loessFit$x,y=loessFit$fitted)
loessFit <- loessFit[order(loessFit$x),]

approxFit <- approx(dat,n = 15)

lowessFit <-data.frame(lowess(dat,f = .6,iter=1))

结果：

plot(dat,col='gray')
curve(sigmoid,0,200,add=TRUE,col='blue',)
lines(lowessFit,col='red')
lines(loessFit,col='green')
lines(approxFit,col='purple')
legend(150,.6,
       legend=c("Sigmoid","Loess","Lowess",'Approx'),
       lty=c(1,1),
       lwd=c(2.5,2.5),col=c("blue","green","red","purple"))

如您所见，lowess产生了与原始生成曲线近乎完美的拟合。 Loess很接近，但在两条尾巴上都出现了奇怪的偏差。

尽管您的数据集会非常不同，但我发现其他数据集的表现相似，两者都有loess并且lowess能够产生良好的结果。当您查看基准时，差异变得更加显着：

> microbenchmark::microbenchmark(loess(y~x, dat, span = 0.6),approx(dat,n = 20),lowess(dat,f = .6,iter=1),times=20)
Unit: milliseconds
                           expr        min         lq       mean     median        uq        max neval cld
  loess(y ~ x, dat, span = 0.6) 153.034810 154.450750 156.794257 156.004357 159.23183 163.117746    20   c
            approx(dat, n = 20)   1.297685   1.346773   1.689133   1.441823   1.86018   4.281735    20 a  
 lowess(dat, f = 0.6, iter = 1)   9.637583  10.085613  11.270911  11.350722  12.33046  12.495343    20  b

Loess非常慢，需要 100 倍的时间approx。 Lowess产生比更好的结果approx，同时仍然运行得相当快（比 loess 快 15 倍）。

Loess随着点数的增加，也变得越来越陷入困境，在 50,000 左右变得无法使用。

编辑：额外的研究表明，loess它更适合某些数据集。如果您正在处理小型数据集或不考虑性能，请尝试这两个函数并比较结果。

score 8 · Accepted Answer

在 ggplot2 中，您可以通过多种方式进行平滑处理，例如：

library(ggplot2)
ggplot(mtcars, aes(wt, mpg)) + geom_point() +
  geom_smooth(method = "gam", formula = y ~ poly(x, 2)) 
ggplot(mtcars, aes(wt, mpg)) + geom_point() +
  geom_smooth(method = "loess", span = 0.3, se = FALSE)

score 3 · Accepted Answer

我没有看到显示的这种方法，所以如果其他人想要这样做，我发现 ggplot 文档提出了一种使用该gam方法的技术，该方法产生的结果loess与处理小型数据集时相似。

library(ggplot2)
x <- 1:10
y <- c(2,4,6,8,7,8,14,16,18,20)

df <- data.frame(x,y)
r <- ggplot(df, aes(x = x, y = y)) + geom_smooth(method = "gam", formula = y ~ s(x, bs = "cs"))+geom_point()
r

首先使用 loess 方法和 auto 公式其次使用 gam 方法和建议公式

r - 如何在 R 中为我的数据拟合平滑曲线？

8 回答 8

Related

Reference