我的回答涉及模式匹配问题。成功匹配后,以下算法将匹配序列的起始位置和位置作为输出。然后,您可以使用此信息进行标记,如您在问题中所述。
我推荐论文中介绍的 SPRING 算法:
时间扭曲距离下的流监测(Sakurai、Faloutsos、Yamamuro)
http://www.cs.cmu.edu/~christos/PUBLICATIONS/ICDE07-spring.pdf
算法核心是 DTW 距离(动态时间扭曲),得到的距离是 DTW 距离。与 DTW 的唯一区别是优化,因为“流”(您正在寻找匹配的序列)的每个位置都是匹配的可能起点 - 与 DTW 相反,DTW 计算每个起点的总距离矩阵。您提供一个模板、一个阈值和一个流(该概念是为从数据流匹配而开发的,但您可以通过简单地循环将算法应用于您的数据帧)
笔记:
- 阈值的选择并非微不足道,可能会带来巨大的挑战——但这将是另一个问题。(threshold = 0,如果你只想匹配完全相同的序列,threshold >0,如果你也想匹配相似的序列)
- 如果您正在寻找多个模板/目标模式,那么您必须创建 SPRING 类的多个实例,每个实例用于一个目标模式。
下面的代码是我的(凌乱的)实现,它缺少很多东西(比如类定义等)。然而它是有效的,应该有助于引导你找到答案。
我实现如下:
#HERE DEFINE
#1)template consisting of numerical data points
#2)stream consisting of numerical data points
template = [1, 2, 0, 1, 2]
stream = [1, 1, 0, 1, 2, 3, 1, 0, 1, 2, 1, 1, 1, 2 ,7 ,4 ,5]
#the threshold for the matching process has to be chosen by the user - yet in reality the choice of threshold is a non-trivial problem regarding the quality of the matching process
#Getting Epsilon from the user
epsilon = input("Please define epsilon: ")
epsilon = float(epsilon)
#SPRING
#1.Requirements
n = len(template)
D_recent = [float("inf")]*(n)
D_now=[0]*(n)
S_recent=[0]*(n)
S_now=[0]*(n)
d_rep=float("inf")
J_s=float("inf")
J_e=float("inf")
check=0
#check/output
matches=[]
#calculation of accumulated distance for each incoming value
def accdist_calc (incoming_value, template,Distance_new, Distance_recent):
for i in range (len(template)):
if i == 0:
Distance_new[i] = abs(incoming_value-template[i])
else:
Distance_new[i] = abs(incoming_value-template[i])+min(Distance_new[i-1], Distance_recent[i], Distance_recent[i-1])
return Distance_new
#deduce starting point for each incoming value
def startingpoint_calc (template_length, starting_point_recent, starting_point_new, Distance_new, Distance_recent):
for i in range (template_length):
if i == 0:
#here j+1 instead of j, because of the programm counting from 0 instead of from 1
starting_point_new[i] = j+1
else:
if Distance_new[i-1] == min(Distance_new[i-1], Distance_recent[i], Distance_recent[i-1]):
starting_point_new[i] = starting_point_new[i-1]
elif Distance_recent[i] == min(Distance_new[i-1], Distance_recent[i], Distance_recent[i-1]):
starting_point_new[i] = starting_point_recent[i]
elif Distance_recent[i-1] == min(Distance_new[i-1], Distance_recent[i], Distance_recent[i-1]):
starting_point_new[i] = starting_point_recent[i-1]
return starting_point_new
#2.Calculation for each incoming point x.t - simulated here by simply calculating along the given static list
for j in range (len(stream)):
x = stream[j]
accdist_calc (x,template,D_now,D_recent)
startingpoint_calc (n, S_recent, S_now, D_now, D_recent)
#Report any matching subsequence
if D_now[n-1] <= epsilon:
if D_now[n-1] <= d_rep:
d_rep = D_now[n-1]
J_s = S_now[n-1]
J_e = j+1
print "REPORT: Distance "+str(d_rep)+" with a starting point of "+str(J_s)+" and ending at "+str(J_e)
#Identify optimal subsequence
for i in range (n):
if D_now[i] >= d_rep or S_now[i] > J_e:
check = check+1
if check == n:
print "MATCH: Distance "+str(d_rep)+" with a starting point of "+str(J_s)+" and ending at "+str(J_e)
matches.append(str(d_rep)+","+str(J_s)+","+str(J_e))
d_rep = float("inf")
J_s = float("inf")
J_e = float("inf")
check = 0
else:
check = 0
#define the recently calculated distance vector as "old" distance
for i in range (n):
D_recent[i] = D_now[i]
S_recent[i] = S_now[i]