4

All,

I'm looking for a reliable, unsupervised way to detect change points in a relatively short vector. Consider the following two examples:

v1 = c(0.299584,0.314446,0.357783,0.388896,0.410417,0.427182,0.450383,0.466671,0.474884,0.474749,0.493566,0.500374,0.522482,0.529851,0.538387,0.577901,0.610939,0.639383,0.662433,0.692656,0.720543,0.738255,0.748055,0.7591,0.770595,0.781811,0.794479,0.794588,0.789448,0.77667,0.765406,0.75152,0.740408,0.726898,0.720766,0.709445,0.69896,0.687508,0.673382,0.65795,0.639214,0.620445,0.590047,0.561773,0.526807,0.486848,0.439681,0.387545,0.313369,0.282872,0.279908,0.271836,0.269088,0.262727,0.259782)

v2 = c(0.081309,0.206263,0.429069,0.511859,0.565194,0.578792,0.56919,0.51985,0.432563,0.193907,0.0771,0.086603,0.18303,0.177608,0.169706,0.260917,0.292062,0.2979,0.263249,0.270576,0.250422,0.25219,0.182878,0.080623,0.079443,0.088944,0.087623,0.126403,0.155563,0.273942,0.312054,0.370195,0.357087,0.336452,0.300574,0.243105,0.243105,0.25593,0.227401,0.218047,0.15857,0.157727,0.139801,0.125742,0.129142,0.142166,0.142166,0.136748,0.107755,0.064377,0.072801,0.060093,0.103441,0.111704,0.124544)

If you look at

plot(v1,type='l') 

and

plot(v2,type='l')

you can see that for v1 I'd like to detect a change around index = 28, and for v2 I'd like to detect changes at the index values of 8, 11, 18, 25, 32, and 51. So far I've experimented with the Bayesian Change Point algorithm, which works OK in terms of identifying where inflection points are likely (low posterior probability regions), but still forces me to rely on visual inspection for the final determination:

install.packages('bcp')
library(bcp)

test = bcp(v1,w0=0.2,p0=0.01)
plot(v1,type='l')
par(new=TRUE)
plot(test$posterior.prob,type='l',col=2)

test = bcp(v2,w0=0.2,p0=0.01)
plot(v2,type='l')
par(new=TRUE)
plot(test$posterior.prob,type='l',col=2)

Is there a way to automate an unsupervised selection of estimates of multiple change points in this kind of data? Maybe I'm just futilely searching for a replacement for human intuition :P I also looked at the changepoint package, but it doesn't seem to be designed for this kind of data.

Thanks, Aaron

4

1 回答 1

5

所以,这是一个简单的解决方案。您可以修改参数以返回不同的(更多/更少,敏感/不敏感)拐点(或区域,在您的数据的情况下)。

plot(v2, type="l", col="darkblue", lwd=2)
# v2 <- smooth(v2, kind="3")  # optional
lines(v2, lwd=1, col="red")
d2 <- diff(v2)
d2 <- d2>0
d2 <- d2*2 -1 
k <- 5
cutoff <- 10
scores <- sapply(k:(length(d2)-k), FUN=function(i){
  score <- abs(mean(-d2[ i-1:k ], na.rm=T) + mean(d2[ i+0:k ], na.rm=T))
})


scores <- sapply(k:(length(v2)-k), FUN=function(i){
  left <- (v2[sapply(i-1:k, max, 1) ]<v2[i])*2-1
  right <- (v2[sapply(i+1:k, min, length(v2)) ]<v2[i])*2-1

  score <- abs(sum(left) + sum(right))
})

inflections <- (k:(length(v2)-k))[scores>=cutoff]

plot(v2, type="l")
abline(v=inflections, col="red", lwd=3)
print(inflections) #  6 11 18 25 32 (missed 51, if you make cutoff=8 it'll catch it...)
于 2013-09-06T01:23:31.917 回答