我无法弄清楚如何更新 networkx dag_find_longest_path() 算法以返回“N”表示关系,而不是返回找到的第一个最大边,或者返回与最大权重相关的所有边的列表。
我首先从 pandas 数据帧创建了一个 DAG,其中包含如下子集的边缘列表:
edge1 edge2 weight
115252161:T 115252162:A 1.0
115252162:A 115252163:G 1.0
115252163:G 115252164:C 3.0
115252164:C 115252165:A 5.5
115252165:A 115252166:C 5.5
115252162:T 115252163:G 1.0
115252166:C 115252167:A 7.5
115252167:A 115252168:A 7.5
115252168:A 115252169:A 6.5
115252165:A 115252166:G 0.5
然后我使用下面的代码对图进行拓扑排序,然后根据边的权重找到最长的路径:
G = nx.from_pandas_edgelist(edge_df, source="edge1",
target="edge2",
edge_attr=['weight'],
create_using=nx.OrderedDiGraph())
longest_path = pd.DataFrame(nx.dag_longest_path(G))
这很好用,除非当最大加权边缘存在联系时,它会返回找到的第一个最大边缘,而我希望它只返回一个表示“Null”的“N”。所以目前,输出是:
115252161 T
115252162 A
115252163 G
115252164 C
115252165 A
115252166 C
但我真正需要的是:
115252161 T
115252162 N (or [A,T] )
115252163 G
115252164 C
115252165 A
115252166 C
寻找最长路径的算法是:
def dag_longest_path(G):
dist = {} # stores [node, distance] pair
for node in nx.topological_sort(G):
# pairs of dist,node for all incoming edges
pairs = [(dist[v][0] + 1, v) for v in G.pred[node]]
if pairs:
dist[node] = max(pairs)
else:
dist[node] = (0, node)
node, (length, _) = max(dist.items(), key=lambda x: x[1])
path = []
while length > 0:
path.append(node)
length, node = dist[node]
return list(reversed(path))
的可复制粘贴定义G
。
import pandas as pd
import networkx as nx
import numpy as np
edge_df = pd.read_csv(
pd.compat.StringIO(
"""edge1 edge2 weight
115252161:T 115252162:A 1.0
115252162:A 115252163:G 1.0
115252163:G 115252164:C 3.0
115252164:C 115252165:A 5.5
115252165:A 115252166:C 5.5
115252162:T 115252163:G 1.0
115252166:C 115252167:A 7.5
115252167:A 115252168:A 7.5
115252168:A 115252169:A 6.5
115252165:A 115252166:G 0.5"""
),
sep=r" +",
)
G = nx.from_pandas_edgelist(
edge_df,
source="edge1",
target="edge2",
edge_attr=["weight"],
create_using=nx.OrderedDiGraph(),
)
longest_path = pd.DataFrame(nx.dag_longest_path(G))