2

我想解析大量有向图数据并对其执行一些逻辑测试。数据如下:

Source,Target,DateTime

a,b,201212100401
a,d,201212100403
b,e,201212100511
b,c,201212100518
e,f,201212100610
c,a,201212100720

数据时间戳为 YYYYMMDDhhmm ...

我有一些我正在寻找的逻辑,例如 find: Instances where A and C talk but not before (A and B) the (B and C). 所以打印出来是这样的:

Time 1| Time 2| Time 3
a,b,201212100401| b,c,201212100518| c,a,201212100720

我假设我可以将这些视为 networkx 对象:

import networkx as nx
import sys

G = nx.DiGraph()

for line in (open(sys.argv[1])):
    n1, n2, t1 = line.split(',')
    G.add_edge(n1, n2, time=t1)

现在数据存储在 G 中,我不确定如何查询 A,B 然后 B,C 然后 C,A 关系。

有没有人有什么建议?

4

1 回答 1

1

这是一种方法:

import networkx as nx

data = '''a,b,201212100401
a,d,201212100403
b,e,201212100511
b,c,201212100518
e,f,201212100610
c,a,201212100720'''.split('\n')

G = nx.DiGraph()
for line in data:
    n1, n2, t1 = line.split(',')
    G.add_edge(n1, n2, time=t1)

def check_sequence(list_of_edges):
    times = []
    # First check if all the edges are in the graph
    # and collect their times in a list
    for e in list_of_edges:
        if e in G.edges():
            times.append(G[e[0]][e[1]]['time'])
        else:
            return "Edge {} not in the graph.".format(str(e))
    # Next check if each successive time in the list 
    # is greater than the previous time
    start = times[0]
    for time in times[1:]:
        if time > start:
            start = time
        else:
            return 'Edges not in sequence: {}'.format(str(times))
    # If we have not returned up to now, then we are in sequence
    return 'Edges are in sequence: {}'.format(str(times))

print check_sequence( [('a', 'e'), ('e', 'f'), ('a', 'f') ] )
# Edge ('a', 'e') not in the graph.
print check_sequence( [('a', 'b'), ('b', 'c'), ('c', 'a') ] )
# Edges are in sequence: ['201212100401', '201212100518', '201212100720']
print check_sequence( [('c', 'a'), ('a', 'b'), ('b', 'c') ] )
# Edges not in sequence: ['201212100720', '201212100401', '201212100518']
于 2013-03-27T07:36:47.053 回答