你可以用crosstab
这个:
import numpy as np
import pandas as pd
factor = pd.cut(distance, 100)
# the crosstab dataframe with the value counts in each bucket
ct = pd.crosstab(factor, adjacency, margins=True,
rownames=['distance'], colnames=['adjacency'])
# from here computing the probability of nodes being adjacent is straightforward
ct['prob'] = np.true_divide(ct[1], ct['All'])
这给出了这种形式的数据框:
>>> ct
adjacency 0 1 All prob
distance
(0.00685, 0.107] 7 4 11 0.363636
(0.107, 0.205] 6 9 15 0.600000
(0.205, 0.304] 6 6 12 0.500000
(0.304, 0.403] 5 2 7 0.285714
(0.403, 0.502] 4 6 10 0.600000
(0.502, 0.6] 8 3 11 0.272727
(0.6, 0.699] 6 2 8 0.250000
(0.699, 0.798] 4 6 10 0.600000
(0.798, 0.896] 4 5 9 0.555556
(0.896, 0.995] 5 2 7 0.285714
All 55 45 100 0.450000