6

I have a simple dataframe consisting of one column. In that column are 10320 observations (numerical). I'm simulating Time-Series data by inserting the data into a plot with a window of 200 observations each. Here is the code for plotting.

import matplotlib.pyplot as plt
from IPython import display
fig_size = plt.rcParams["figure.figsize"]
import time
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
fig, axes = plt.subplots(1,1, figsize=(19,5))
df = dframe.set_index(arange(0,len(dframe)))
std = dframe[0].std() * 6
window = 200
iterations = int(len(dframe)/window)
i = 0
dframe = dframe.set_index(arange(0,len(dframe)))
while i< iterations:
    frm = window*i
    if i == iterations:
        to = len(dframe)
    else:
        to = frm+window
    df = dframe[frm : to]
    if len(df) > 100:
        df = df.set_index(arange(0,len(df)))
        plt.gca().cla() 
        plt.plot(df.index, df[0])
        plt.axhline(y=std, xmin=0, xmax=len(df[0]),c='gray',linestyle='--',lw = 2, hold=None)
        plt.axhline(y=-std , xmin=0, xmax=len(df[0]),c='gray',linestyle='--', lw = 2, hold=None)
        plt.ylim(min(dframe[0])- 0.5 , max(dframe[0]) )
        plt.xlim(-50,window+50)
        display.clear_output(wait=True)
        display.display(plt.gcf()) 
        canvas = FigureCanvas(fig)
        canvas.print_figure('fig.png', dpi=72, bbox_inches='tight')
    i += 1
plt.close()

This simulates a flow of real-time data and visualizes it. What I want is to apply theanets RNN LSTM to the data to detect anomalies unsupervised. Because I am doing it unsupervised I don't think that I need to split my data into training and test sets. I haven't found much of anything that makes sense to me so far and have been googling for about 2 hours. Just hoping that you guys may be able to help. I want to put the prediction output of the RNN on the graph as well and define a threshold that, if the error is too large, the values will be identified as anomalous. If you need more information please comment and let me know. Thank you!

4

1 回答 1

2

阅读

  1. 与神经元一样,LSTM 网络是由相互连接的LSTM 块构建的,其训练是通过BackPropogation Through Time完成的。
  2. 使用时间序列的经典异常检测需要预测未来的时间序列输出(在一个或多个点),并在这些点上找到具有真实值的错误。超过阈值的预测误差将反映并在很大程度上

解决方案

说了这么多

  1. 你必须训练网络,所以你需要训练测试集
  2. 使用 N 个输入来预测 M 个输出(通过实验决定 N 和 M - 训练误差较低的值
  3. 滚动输入数据中 (N+M) 个元素的窗口,并使用此 (N+M) 个项目的数据数组也称为框架来训练或测试网络。
  4. 通常我们使用 90% 的起始序列进行训练,10% 用于测试。

该方案将失败,就好像训练不正确,会有非异常的错误预测错误。因此,请确保提供足够的训练,最重要的shuffle 训练帧考虑所有变化

于 2017-03-04T03:57:25.243 回答