python - python线性回归变化点检测

Question

我正在尝试从给定的一维数据流列表中检测漂移。如果数据中没有趋势，我期待0 <= confidence score <= 0.20，但如果检测到漂移，我期待0.90 <= confidence score <= 1。

我附上了我正在使用的 Python 3.x 代码片段，以及我的手工计算（最后的图片）。

import numpy as np
from univariate import UnivariateAnalysis
from scipy import stats


class UnivariateDriftAnalysis:
    ''' This technique looks for a trend in recent data using linear
    regression as a statistical test that the trend is non-zero
    Currently, this uses a fixed window length, but future versions might
    incorporate a search over a range of window lengths
    '''

    def __init__(self, n_window, p=0.01):
        '''
        n_window - (int) length of data history to look for a trend
        p - (int) desired confidence or false positive rate.
            p=.05 means that alarms will be raised when there is <5% chance
            that there is no trend
        '''
        self.n_window = n_window
        self.p = p

    def drift_detected(self, data) -> list:
        ''' Returns an array, x, of probabilities that the slope of the data is
        not zero. i.e., the confidence that there is a slope.
        x[i] corresponds to the slope of data[i-n_window:i]
        The first n_window values of x are np.NaN
        '''
        n = len(data)
        y = []
        x0 = np.arange(n)
        result: list = [np.NaN] * self.n_window
        i = 0
        for d in data:
            y.append(d)
            if len(y) < self.n_window:
                # if max_history_samples < window_length
                continue
            y = y[-self.n_window:]
            x = x0[i:i + self.n_window]
            p_value = stats.linregress(x, y).pvalue
            # slope, intercept, r_value, p_value, std_err = rez
            result.append(1-p_value)
            i += 1
        return result

    def update(self, data) -> None:
        ''' this function is designed to handle live stream of data'''
        scores = self.alarm_score(data)
        alarms = [r < self.p for r in alarm_scores]
        # some other stuff

# Test
np.random.seed(100)
n_window = 10
lr = LinearRegressionSPC(n_window=n_window, p=.01)
data = np.concatenate([np.random.randint(24, 47, 1500), np.random.randint(1000, 4000, 2000), np.random.randint(1, 5, 500)])
score = lr.alarm_score(data)
print(result[n_window:])  # lowest: 0 highest: 0.9953301824956942

问题：

我错过了什么？为什么置信度分数高达0.9953！？
我的最终目标是p value为给定的数据数组定义以计算漂移存在置信度。

python - python线性回归变化点检测

0 回答 0

Related

Reference