1

I am having some issues trying to compete a LOESS regression with a data set. I have been able to properly create the line, but I am unable to get it to plot correctly.

I ran through the data like this.

animals.lo <- loess(X15p5 ~ Period, animals, weights = n.15p5)    
animals.lo    
summary(animals.lo)    
plot(X15p5~ Period, animals)    
lines(animals$X15p5, animals.lo, col="red")  

At this point I received an error

"Error in xy.coords(x, y) : 'x' and 'y' lengths differ"

I searched around and read that this issue could be due to the points needing to be ordered, so I proceeded.

a <- order(animals$Period)    
lines(animals$X15p5[a], animals.lo$Period[a], col="red", lwd=3)  

There were no errors at this point, but the LOESS line was still not showing up in the plot. The points were displayed correctly, but not the line.

This is similar to the data set I am using...

structure(list(Site = c("Cat", "Dog", "Bear", "Chicken", "Cow",
"Bird", "Tiger", "Lion", "Leopard", "Wolf", "Puppy", "Kitten", 
"Emu", "Ostrich", "Elephant", "Sheep", "Goat", "Fish", "Iguana", 
"Monkey", "Gorilla", "Baboon", "Lemming", "Mouse", "Rat", "Hamster", 
"Eagle", "Parrot", "Crow", "Dove", "Falcon", "Hawk", "Sparrow", 
"Kite", "Chimpanzee", "Giraffe", "Bear", "Donkey", "Mule", "Horse", 
"Zebra", "Ox", "Snake", "Cobra", "Iguana", "Lizard", "Fly", "Mosquito", 
"Llama", "Butterfly", "Moth", "Worm", "Centipede", "Unicorn", 
"Pegasus", "Griffin", "Ogre", "Monster", "Demon", "Witch", "Vampire", 
"Mummy", "Ghoul", "Zombie"), Region = c(6L, 4L, 4L, 5L, 7L, 6L, 
2L, 4L, 6L, 7L, 7L, 4L, 6L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 8L, 4L, 6L, 6L, 
4L, 2L, 7L, 4L, 2L, 2L, 7L, 3L, 4L, 7L, 4L, 4L, 4L, 7L, 7L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 8L), Period = c(-2715, -3500, 
-3500, -4933.333333, -2715, -2715, -2715, -3500, -2715, -4350, 
-3500, -3500, -2950, -4350, -3650, -3500, -3500, -2715, -3650, 
-4350, -3500, -3500, -3400, -4350, -3500, -3500, -4350, -3900, 
-3808.333333, -4233.333333, -3500, -3900, -3958.333333, -3900, 
-3500, -3500, -3500, -2715, -3650, -2715, -2715, -2715, -2715, 
-3500, -2715, -2715, -3500, -4350, -3650, -3650, -4350, -5400, 
-3500, -3958.333333, -3400, -3400, -4350, -3600, -4350, -3650, 
-3500, -2715, -5400, -3500), Value = c(0.132625995, 0.163120567, 
0.228840125, 0.154931973, 0.110047847, 0.054347826, 0.188679245, 
0.245014245, 0.128378378, 0.021428571, 0.226277372, 0.176923077, 
0.104938272, 0.17659805, 0.143798024, 0.086956522, 0.0625, 0.160714286, 
0, 0.235588972, 0, 0, 0.208333333, 0.202247191, 0.364705882, 
0.174757282, 0, 0.4, 0.1, 0.184027778, 0.232876712, 0.160493827, 
0.74702381, 0.126984127, 0.080645161, 0.06557377, 0, 0.057692308, 
0.285714286, 0.489361702, 0.108695652, 0.377777778, 0, 0.522727273, 
0.024390244, 0.097560976, 0.275, 0, 0.0625, 0.255319149, 0.135135135, 
0.216216216, 0.222222222, 0.296296296, 0.222222222, 0.146341463, 
0.09375, 0.125, 0.041666667, 0.078947368, 0.2, 0.137931034, 0.571428571, 
0.142857143), Sample_size = c(188.5, 105.75, 79.75, 70, 52.25, 
46, 39.75, 39, 37, 35, 34.25, 32.5, 32.4, 30.76666667, 30.36666667, 
28.75, 28, 28, 28, 26.6, 25, 25, 24, 22.25, 21.25, 20.6, 20, 
20, 20, 19.2, 18.25, 18, 18, 16.8, 15.5, 15.25, 15, 13, 12.6, 
11.75, 11.5, 11.25, 11, 11, 10.25, 10.25, 10, 10, 9.6, 9.4, 9.25, 
9.25, 9, 9, 9, 8.2, 8, 8, 8, 7.6, 7.5, 7.25, 7, 7), Sample_sub = c(25, 
17.25, 18.25, 10.8452381, 5.75, 2.5, 7.5, 9.555555556, 4.75, 
0.75, 7.75, 5.75, 3.4, 5.433333333, 4.366666667, 2.5, 1.75, 4.5, 
0, 6.266666667, 0, 0, 5, 4.5, 7.75, 3.6, 0, 8, 2, 3.533333333, 
4.25, 2.888888889, 13.44642857, 2.133333333, 1.25, 1, 0, 0.75, 
3.6, 5.75, 1.25, 4.25, 0, 5.75, 0.25, 1, 2.75, 0, 0.6, 2.4, 1.25, 
2, 2, 2.666666667, 2, 1.2, 0.75, 1, 0.333333333, 0.6, 1.5, 1, 
4, 1)), .Names = c("Site", "Region", "Period", "Value", "Sample_size", 
"Sample_sub"), class = "data.frame", row.names = c(NA, -64L))

I have been working for this a while and trying to read up as much as I can, but I haven't been able to make any additional headway. Any advice or guidance would be greatly appreciated.


Follow-up on adding confidence interval to plot

I have been trying to add in confidence intervals following another example found on the site on this page How to get the confidence intervals for LOWESS fit using R? .

The example given on that page is:

plot(cars)
plx<-predict(loess(cars$dist ~ cars$speed), se=T)

lines(cars$speed,plx$fit)
lines(cars$speed,plx$fit - qt(0.975,plx$df)*plx$se, lty=2)
lines(cars$speed,plx$fit + qt(0.975,plx$df)*plx$se, lty=2)  

I adapted that as this:

plot(X15p5 ~ Period, animals)
animals.lo2<-predict(loess(animals$X15p5 ~ animals$Period), se=T)
a <- order(animals$Period)
lines(animals$Period[a],animals.lo2$fit, col="red", lwd=3)
lines(animals$Period[a],animals.lo2$fit - qt(0.975,animals.lo2$df)*animals.lo2$se, lty=2)
lines(animals$Period[a],animals.lo2$fit + qt(0.975,animals.lo2$df)*animals.lo2$se, lty=2)

Although this does provide confidence intervals, the regression line is all wrong. I'm not sure if it is an issue with the predict function, or another issue. Thanks again!

4

1 回答 1

3

正确的代码

我四处搜索并发现这个问题可能是由于需要订购的点,所以我继续。

不,不。订购问题与您看到的错误无关。要克服错误,您需要更换

lines(animals$X15p5, animals.lo, col="red") 

lines(animals$Period, animals.lo$fitted, col="red") 

以下是原因:

  1. loess返回对象列表,而不是单个向量。见str(animals.lo)names(animals.lo)
  2. 你为什么用animals$X15p5x轴?你适合你的模型:X15p5 ~ Period,所以 x 轴应该是Period

关于重新排序

您需要进行排序,因为默认情况下,R 按顺序排列点。以此为例:

set.seed(0); x <- runif(100, 0, 10)  ## x is not in order
set.seed(1); y <- sqrt(x)  ## plot curve y = sqrt(x)
par(mfrow = c(1,2))
plot(x, y, type = "l")  ## this is a mess!!
reorder <- order(x)
plot(x[reorder], y[reorder], type = "l")  ## this is nice

富

同样,执行:

a <- order(animals$Period)    
lines(animals$Period[a], animals.lo$fitted[a], col="red", lwd=3)

置信区间跟踪

尝试这个:

plot(X15p5 ~ Period, animals)
animals.lo <- loess(X15p5 ~ Period, animals)
pred <- predict(animals.lo, se = TRUE)
a <- order(animals$Period)
lines(animals$Period[a], pred$fit[a], col="red", lwd=3)
lines(animals$Period[a], pred$fit[a] - qt(0.975, pred$df)*pred$se[a],lty=2)
lines(animals$Period[a], pred$fit[a] - qt(0.975, pred$df)*pred$se[a],lty=2)

你又忘了重新订购。您需要重新排序拟合值和标准误差。

现在,数据dist ~ speed模型cars不需要重新排序。因为:

is.unsorted(cars$speed)  ## FALSE

是的,数据已经在那里排序。

请注意,我对您的代码进行了另外两项更改:

  1. 我已经分开loess通话和predict通话;也许您不需要这样做,但通常将模型拟合和模型预测分开是一个好习惯,并保留两个对象的副本。
  2. 我已更改loess(animals$X15p5 ~ animals$Period)loess(X15p5 ~ Period, animals). $在指定模型公式时使用符号是一个坏习惯。我在https://stackoverflow.com/a/37307270/4891738有另一个答案,显示了这种风格的缺点。您可以阅读那里的“更新”部分。我以 theglm为例,但是对于lm, glm, loess, 事情是一样的。
于 2016-05-28T14:22:22.797 回答