-2

我创建了代码来计算每个时间步长(帧)基于 x、y、z 坐标(TX、TY、TZ)的所有对象(tagID)之间的距离。虽然这段代码确实有效,但对于我需要的东西来说太慢了。我目前的测试数据,大概有538792行数据,我的实际数据大概是688万行数据。目前制作这些距离矩阵需要几分钟(可能是 10-15 分钟),而且由于我将拥有 40 组数据,因此我想加快速度。

当前代码如下:

# Sample data frame with correct columns:

data2 = ({'Frame' :[1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7], 
      'tagID' : ['nb1','nb2','nb3','nb1','nb2','nb3','nb1','nb2','nb3','nb1','nb2','nb3','nb1','nb2','nb3','nb1','nb2','nb3','nb1','nb2','nb3'],
      'TX':[5,2,3,4,5,6,7,5,np.nan,5,2,3,4,5,6,7,5,4,8,3,2],
      'TY':[4,2,3,4,5,9,3,2,np.nan,5,2,3,4,5,6,7,5,4,8,3,2],
      'TZ':[2,3,4,6,7,8,4,3,np.nan,5,2,3,4,5,6,7,5,4,8,3,2]})

df = pd.DataFrame(data2)

Frame tagID   TX   TY   TZ
0       1   nb1  5.0  4.0  2.0
1       1   nb2  2.0  2.0  3.0
2       1   nb3  3.0  3.0  4.0
3       2   nb1  4.0  4.0  6.0
4       2   nb2  5.0  5.0  7.0
5       2   nb3  6.0  9.0  8.0
6       3   nb1  7.0  3.0  4.0
7       3   nb2  5.0  2.0  3.0
8       3   nb3  NaN  NaN  NaN
9       4   nb1  5.0  5.0  5.0
10      4   nb2  2.0  2.0  2.0
11      4   nb3  3.0  3.0  3.0
12      5   nb1  4.0  4.0  4.0
13      5   nb2  5.0  5.0  5.0
14      5   nb3  6.0  6.0  6.0
15      6   nb1  7.0  7.0  7.0
16      6   nb2  5.0  5.0  5.0
17      6   nb3  4.0  4.0  4.0
18      7   nb1  8.0  8.0  8.0
19      7   nb2  3.0  3.0  3.0
20      7   nb3  2.0  2.0  2.0


# Calculate the squared distance between all x points:

TXdf = [] 
for i in range(1,df['Frame'].max()+1):
    boox = df['Frame'] == i 
    tempx = df[boox] 
    tx=tempx['TX'].apply(lambda x : (tempx['TX']-x)**2) 
    tx.columns=tempx.tagID   
    tx['ID']=tempx.tagID 
    tx['Frame'] = tempx.Frame 
    TXdf.append(tx) 
TXdfFinal = pd.concat(TXdf) # once all df for every 
print(TXdfFinal)
TXdfFinal.info()

# Calculate the squared distance between all y points:

print('y-diff sum')
TYdf = [] 
for i in range(1,df['Frame'].max()+1):
    booy = df['Frame'] == i 
    tempy = df[booy] 
    ty=tempy['TY'].apply(lambda x : (tempy['TY']-x)**2) 
    ty.columns=tempy.tagID   
    ty['ID']=tempy.tagID 
    ty['Frame'] = tempy.Frame 
    TYdf.append(ty) 
TYdfFinal = pd.concat(TYdf) 
print(TYdfFinal)
TYdfFinal.info()

# Calculate the squared distance between all z points:

print('z-diff sum')
TZdf = [] 
for i in range(1,df['Frame'].max()+1):
    booz = df['Frame'] == i 
    tempz = df[booz] 
    tz=tempz['TZ'].apply(lambda x : (tempz['TZ']-x)**2) 
    tz.columns=tempz.tagID  
    tz['ID']=tempz.tagID 
    tz['Frame'] = tempz.Frame 
    TZdf.append(tz) 
TZdfFinal = pd.concat(TZdf)


# Add all squared differences together:

euSum = TXdfFinal + TYdfFinal + TZdfFinal

# Square root the sum of the differences of each coordinate for Euclidean distance and add Frame and ID columns back on:

euDist = euSum.loc[:, euSum.columns !='ID'].apply(lambda x: x**0.5)
euDist['tagID'] = list(TXdfFinal['ID'])
euDist['Frame'] = list(TXdfFinal['Frame'])


# Add the distance matrix to the original dataframe based on Frame and ID columns:

new_df = pd.merge(df, euDist,  how='left', left_on=['Frame','tagID'], right_on = ['Frame','tagID'])

   Frame tagID   TX   TY   TZ      nb1     nb2      nb3
0       1   nb1  5.0  4.0  2.0   0.0000  3.7417   3.0000
1       1   nb2  2.0  2.0  3.0   3.7417  0.0000   1.7321
2       1   nb3  3.0  3.0  4.0   3.0000  1.7321   0.0000
3       2   nb1  4.0  4.0  6.0   0.0000  1.7321   5.7446
4       2   nb2  5.0  5.0  7.0   1.7321  0.0000   4.2426
5       2   nb3  6.0  9.0  8.0   5.7446  4.2426   0.0000
6       3   nb1  7.0  3.0  4.0   0.0000  2.4495      NaN
7       3   nb2  5.0  2.0  3.0   2.4495  0.0000      NaN
8       3   nb3  NaN  NaN  NaN      NaN     NaN      NaN
9       4   nb1  5.0  5.0  5.0   0.0000  5.1962   3.4641
10      4   nb2  2.0  2.0  2.0   5.1962  0.0000   1.7321
11      4   nb3  3.0  3.0  3.0   3.4641  1.7321   0.0000
12      5   nb1  4.0  4.0  4.0   0.0000  1.7321   3.4641
13      5   nb2  5.0  5.0  5.0   1.7321  0.0000   1.7321
14      5   nb3  6.0  6.0  6.0   3.4641  1.7321   0.0000
15      6   nb1  7.0  7.0  7.0   0.0000  3.4641   5.1962
16      6   nb2  5.0  5.0  5.0   3.4641  0.0000   1.7321
17      6   nb3  4.0  4.0  4.0   5.1962  1.7321   0.0000
18      7   nb1  8.0  8.0  8.0   0.0000  8.6603  10.3923
19      7   nb2  3.0  3.0  3.0   8.6603  0.0000   1.7321
20      7   nb3  2.0  2.0  2.0  10.3923  1.7321   0.0000

我曾尝试同时使用: euclidean() 和 pdist() 与 metric='euclidean' 但无法使迭代正确。

任何关于如何获得相同结果但速度更快的建议将不胜感激。

4

2 回答 2

1

方法从scipy

from scipy.spatial import distance
df['nb1'],df['nb2'],df['nb3']=np.concatenate([distance.cdist(y, y, metric='euclidean') for x , y in df[['TX','TY','TZ']].groupby(df['Frame'])]).T
于 2019-05-20T21:54:11.247 回答
0

您可以尝试将 for 循环的数量从 3 减少到 1。看起来您正在对同一项目进行三次迭代。尝试在一个循环中完成所有计算

那应该将您的时间缩短三分之二。

于 2019-05-20T21:31:09.470 回答