语境
我正在运行一个生成多个(networkx)图的模拟。(它实际上是一个基于台面代理的模拟,灵感来自网络上的病毒示例)并且由于随机性,我多次运行每组参数。
最小的例子
以下示例希望能够让您了解我想要实现的目标
import numpy as np
import pandas as pd
import networkx as nx
def avg_degree(G):
return 2 * G.number_of_edges() / G.number_of_nodes()
def degree_distribution(G):
return np.array(nx.degree_histogram(G))
def network_metrics(G):
return {
"avg degree": avg_degree(G),
"degree distribution": degree_distribution(G),
}
def generate_data(step, run, n, p):
G = nx.erdos_renyi_graph(n, p)
dct = {
'network_type': 'random',
'run': run,
'nb_agents': n,
'probability': p,
'infected': np.random.randint(0,100, step),
}
dct.update(network_metrics(G))
return dct
def main(nb_steps):
lst = []
for run in range(4):
for nb_nodes in [10, 20, 50]:
for probability in [0.1, 0.5, 0.8]:
dct = generate_data(nb_steps, run, nb_nodes, probability)
lst.append(dct)
result = pd.DataFrame(lst)
indexes = ['network_type', 'run', 'nb_agents', 'probability']
result.set_index(indexes, inplace=True, drop=True)
return result
这使
result = main(10)
result.head()
网络类型 | 跑 | nb_agents | 可能性 | 已感染 | avg_degree | degree_distribution |
---|---|---|---|---|---|---|
随机的 | 0 | 10 | 0.1 | [73 86 96 94 33 57 36 15 30 74] | 0.8 | [5 3 1 1] |
随机的 | 0 | 10 | 0.5 | [ 4 0 64 37 40 16 30 67 51 36] | 4.2 | [0 0 0 4 2 2 2] |
随机的 | 0 | 10 | 0.8 | [59 96 51 68 81 11 40 31 26 95] | 7.2 | [0 0 0 0 0 0 1 6 3] |
随机的 | 0 | 20 | 0.1 | [17 91 26 32 63 65 79 28 80 32] | 1.8 | [3 4 9 2 2] |
随机的 | 0 | 20 | 0.5 | [17 2 5 17 85 13 42 77 70 72] | 9 | [0 0 0 0 0 0 2 2 4 3 6 1 2] |
目标
- 爆炸
infected
和degree distribution
(!!注意:infected
有一个固定的长度(=nb_steps
)但不是degree distribution
(其长度变化))。 - 将所有内容合并到一个数据集中
当前解决方案
我使用以下辅助函数来分解不同的列
def explode(df, mapping):
new_df = df.reset_index()
for col, idx in mapping.items():
new_df.index.rename('_id', inplace=True)
new_df = new_df.explode(col)
new_df.insert(1, idx, new_df.groupby('_id').cumcount())
new_df.reset_index(drop=True, inplace=True)
idx = list(df.index.names)
nested_idx = list(mapping.values())
return new_df.set_index(idx + nested_idx)
def helper(df, mapping):
sol = []
for k,v in mapping.items():
sol.append(explode(df[k], {k:v}).to_xarray())
leftover_columns = list(df.columns.difference(mapping.keys()))
sol.append(df[leftover_columns].to_xarray())
return sol
我使用如下:
mapping = {'infected': 'step', 'degree distribution': "nb_nodes_with_degree"}
lst = helper(result, mapping)
xr.combine_by_coords(lst)
结果
>>> xr.combine_by_coords(lst) # final result
<xarray.Dataset>
Dimensions: (network_type: 1, run: 4, nb_agents: 3, probability: 3, nb_nodes_with_degree: 47, step: 10)
Coordinates:
* network_type (network_type) object 'random'
* run (run) int64 0 1 2 3
* nb_agents (nb_agents) int64 10 20 50
* probability (probability) float64 0.1 0.5 0.8
* nb_nodes_with_degree (nb_nodes_with_degree) int64 0 1 2 3 4 ... 43 44 45 46
* step (step) int64 0 1 2 3 4 5 6 7 8 9
Data variables:
avg degree (network_type, run, nb_agents, probability) float64 ...
degree distribution (network_type, run, nb_agents, probability, nb_nodes_with_degree) object ...
infected (network_type, run, nb_agents, probability, step) object ...
限制
它可以工作,但速度很慢而且非常不雅。--> 有什么更好的处理方式?