不完全确定您的数据,但假设事件可以有多个位置,位置可以有多个事件,并且两个事件或两个位置之间没有边,这是一个可能的解决方案。
它会忽略其他事件信息,但您可以根据需要轻松修改它以添加元数据,并且它还使用边列表存储每个节点,最适合稀疏数据表示。
这是csv文件中的模拟数据(格式:loc_x loc_y,事件名称)
1x 2y, mission_go
3x 4y, hvi_hiding
5x 6y, maxim11_flying
5x 6y, maxim12_flying
3x 5y, taskforce_observing
3x 5y, king_arthur_call
5x 6y, cleared_hot
3x 4y, target_strike
1x 2y, chow_food
1x 2y, drink_illegal_alchohol
3x 3y, chow_food
3x 3y, drink_illegal_alchohol
这是导入数据的代码
import csv
import pprint
import collections
# Store the nodes in a dict with edges in a list, assuming sparse data
location_nodes = collections.defaultdict(list)
event_nodes = collections.defaultdict(list)
# Open the csv file and read
with open('/path/to/your_csv_file.csv') as csv_file:
for event in csv.reader(csv_file):
# Use each location as a key (node) and list of events (edges)
location_nodes[event[0]].append(event[1])
# Same for events
event_nodes[event[1]].append(event[0])
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(dict(location_nodes))
print ""
pp.pprint(dict(event_nodes))
这是打印的输出
{ '1x 2y': [' mission_go', ' chow_food', ' drink_illegal_alchohol'],
'3x 3y': [' chow_food', ' drink_illegal_alchohol'],
'3x 4y': [' hvi_hiding', ' target_strike'],
'3x 5y': [' taskforce_observing', ' king_arthur_call'],
'5x 6y': [' maxim11_flying', ' maxim12_flying', ' cleared_hot']}
{ ' chow_food': ['1x 2y', '3x 3y'],
' cleared_hot': ['5x 6y'],
' drink_illegal_alchohol': ['1x 2y', '3x 3y'],
' hvi_hiding': ['3x 4y'],
' king_arthur_call': ['3x 5y'],
' maxim11_flying': ['5x 6y'],
' maxim12_flying': ['5x 6y'],
' mission_go': ['1x 2y'],
' target_strike': ['3x 4y'],
' taskforce_observing': ['3x 5y']}
将信息保存回 csv 文件的代码
with open('/path/to/location_node.csv', 'w') as loc_file:
writer = csv.writer(loc_file)
for location in location_nodes:
location_list = [location]
location_list.extend(location_nodes[location])
writer.writerow(location_list)
with open('/path/to/event_node.csv', 'w') as event_file:
writer = csv.writer(event_file)
for event in event_nodes:
event_list = [event]
event_list.extend(event_nodes[event])
writer.writerow(event_list)
以及文件的样子
location_node.csv
3x 5y, taskforce_observing, king_arthur_call
5x 6y, maxim11_flying, maxim12_flying, cleared_hot
3x 4y, hvi_hiding, target_strike
3x 3y, chow_food, drink_illegal_alchohol
1x 2y, mission_go, chow_food, drink_illegal_alchohol
event_node.csv
cleared_hot,5x 6y
drink_illegal_alchohol,1x 2y,3x 3y
maxim11_flying,5x 6y
mission_go,1x 2y
chow_food,1x 2y,3x 3y
maxim12_flying,5x 6y
taskforce_observing,3x 5y
target_strike,3x 4y
hvi_hiding,3x 4y
king_arthur_call,3x 5y
需要查看数据才能更具体地了解 munging。