python - 聚类数据子组中的行

Question

我df在 3-d 空间中有一个对象组件的数据集 - 每个都ID代表一个具有各种组件的对象：

ID   Comp   x        y        z
A    1      2        2        1     
A    2      2        1        -1
A    3      -10      1        -10
A    4      -1       3        -5
B    1      3        0        0
B    2      3        0        -5
...

我想遍历每个ID，使用聚类技术根据每个组件的 ( , , ) 坐标sklearn创建组件 ( ) 集群- 以实现如下效果：Compxyz

ID   Comp   x        y        z        cluster
A    1      2        2        1        1
A    2      2        1        -1       1
A    3      -10      1        -10      2
A    4      -1       3        -5       3
B    1      3        0        0        1
B    2      3        0        -5       1
...

举个例子 - ID: A,Comp :1 is incluster 1, whereasID :A, Comp:4 在cluster3中。 （我计划然后连接ID和cluster稍后）。

我在以下方面没有运气groupby + apply：

from sklearn.cluster import AffinityPropagation
ap = AffinityPropagation()

df['cluster']=df.groupby(['ID','Comp']).apply(lambda x: ap.fit_predict(np.array([x.x,x.y,x.z]).T))

for我可以通过使用循环来暴力破解它，ID但我的数据集很大（~ 150k ID），我担心资源和时间限制。任何帮助都会很棒！

score 2 · Accepted Answer

IIUC，我认为你可以尝试这样的事情：

def ap_fit_pred(x):
    ap = AffinityPropagation()
    return pd.Series(ap.fit_predict(x.loc[:,['x','y','z']]))

df['cluster'] = df.groupby('ID').apply(ap_fit_pred).reset_index(drop=True)

python - 聚类数据子组中的行

1 回答 1

Related

Reference