给定一个(可能很大~2+GBs)json文件中节点之间的事务,有~百万个节点和~1000万个事务,每个事务有10-1000个节点,例如
{"transactions":
[
{"transaction 1": ["node1","node2","node7"], "weight":0.41},
{"transaction 2": ["node4","node2","node1","node3","node10","node7","node9"], "weight":0.67},
{"transaction 3": ["node3","node10","node11","node2","node1"], "weight":0.33},...
]
}
将其转换为节点亲和力矩阵的最优雅和最有效的pythonic方法是什么,其中亲和力是节点之间加权事务的总和。
affinity [i,j] = weighted transaction count between nodes[i] and nodes[j] = affinity [j,i]
例如
affinity[node1, node7] = [0.41 (transaction1) + 0.67 (transaction2)] / 2 = affinity[node7, node1]
注意:亲和矩阵是对称的,因此仅计算下三角形就足够了。
值不代表***结构示例!
节点1 | 节点2 | 节点3 | 节点4 | ....
节点1 1 .4 .1 .9 ...节点2
.4 1 .6 .3 ...节点3
.1 .6 1 .7 ...
节点4 .9 .3 .7 1
..... .