0

我创建了一个函数来将二分边缘列表投影到单模式边缘列表中,并且一切正常。但是,我现有的计划是将所有这些边添加到列表中,然后将该列表加载到 pandas 数据框中,并根据边权重过滤列表以创建新的数据框,然后将这些数据框写入 csv。

这一直很好,直到我的数据变得太大而无法保存在 RAM 中。

我在想,与其将单模式边缘列表添加到列表中,不如将其内容写入foldedCSV,甚至跳过将数据添加到列表中。我还想过滤我写到 CSV 的内容,只写权重大于或等于 2 的行。

数据:

E1,Brenda Rogers
E1,Evelyn Jefferson
E1,Laura Mandeville
E10,Nora Fayette
E10,Helen Lloyd
E10,Katherina Rogers
E10,Myra Liddel
E10,Sylvia Avondale
E11,Flora Price
E11,Nora Fayette
E11,Helen Lloyd
E11,Olivia Carleton
E12,Nora Fayette
E12,Verne Sanderson
E12,Helen Lloyd
E12,Katherina Rogers
E12,Myra Liddel
E12,Sylvia Avondale
E13,Nora Fayette
E13,Katherina Rogers
E13,Sylvia Avondale
E14,Nora Fayette
E14,Katherina Rogers
E14,Sylvia Avondale
E2,Evelyn Jefferson
E2,Laura Mandeville
E2,Theresa Anderson
E3,Brenda Rogers
E3,Charlotte McDowd
E3,Frances Anderson
E3,Evelyn Jefferson
E3,Laura Mandeville
E3,Theresa Anderson
E4,Brenda Rogers
E4,Charlotte McDowd
E4,Evelyn Jefferson
E4,Theresa Anderson
E5,Brenda Rogers
E5,Charlotte McDowd
E5,Frances Anderson
E5,Evelyn Jefferson
E5,Ruth DeSand
E5,Eleanor Nye
E5,Laura Mandeville
E5,Theresa Anderson
E6,Brenda Rogers
E6,Nora Fayette
E6,Frances Anderson
E6,Evelyn Jefferson
E6,Eleanor Nye
E6,Laura Mandeville
E6,Pearl Oglethorpe
E6,Theresa Anderson
E7,Brenda Rogers
E7,Charlotte McDowd
E7,Nora Fayette
E7,Verne Sanderson
E7,Ruth DeSand
E7,Helen Lloyd
E7,Eleanor Nye
E7,Laura Mandeville
E7,Sylvia Avondale
E7,Theresa Anderson
E8,Brenda Rogers
E8,Verne Sanderson
E8,Frances Anderson
E8,Dorothy Murchison
E8,Evelyn Jefferson
E8,Ruth DeSand
E8,Helen Lloyd
E8,Eleanor Nye
E8,Katherina Rogers
E8,Laura Mandeville
E8,Myra Liddel
E8,Pearl Oglethorpe
E8,Sylvia Avondale
E8,Theresa Anderson
E9,Flora Price
E9,Nora Fayette
E9,Verne Sanderson
E9,Dorothy Murchison
E9,Evelyn Jefferson
E9,Ruth DeSand
E9,Olivia Carleton
E9,Katherina Rogers
E9,Myra Liddel
E9,Pearl Oglethorpe
E9,Sylvia Avondale
E9,Theresa Anderson

如何更改我的代码以直接写入 CSV 并跳过将边添加到折叠列表中,但只添加权重大于或等于 3 的边?

下面是原样的代码,它将所有边添加到列表中,然后将列表写入 CSV:

import csv
import networkx as nx
from networkx.algorithms import bipartite

def fold_network(input_file):

    # load text file into a dict with head as keys
    header = ['Event','Name']        
    rawData = [{key: value for (key, value) in zip(header, line.strip().split(','))} for line in open(input_file)]

    # create edgelist for Name -x- Event relationships
    edgelist = []
    for i in rawData:
        edgelist.append(
        (i['Event'],
        i['Name'])    
        )

    # create a unique list of Name and Event for nodes
    Event = sorted(set([i['Event'] for i in rawData]))
    Name = sorted(set([i['Name'] for i in rawData]))

    # add nodes and edges to a graph
    B = nx.Graph()
    B.add_nodes_from(Event, bipartite=0)
    B.add_nodes_from(Name, bipartite=1)
    B.add_edges_from(edgelist)

    # create bipartite projection graph
    name_nodes, event_nodes = bipartite.sets(B)
    event_nodes = set(n for n,d in B.nodes(data=True) if d['bipartite']==0)
    name_nodes = set(B) - event_nodes

    # project graph and write projected graph's edgelist to a list
    seen = set()
    folded = []
    for u in name_nodes:
    #    seen=set([u]) # print both u-v, and v-u
        seen.add(u) # don't print v-u
        unbrs = set(B[u])
        nbrs2 = set((n for nbr in unbrs for n in B[nbr])) - seen
        for v in nbrs2:
            vnbrs = set(B[v])
            common = unbrs & vnbrs
            weight = len(common)
            row = u, v, weight
            folded.append(row)

    # write folded list containing only edges with weight greater than or equal to 3 to CSV
    for i in folded:
        if i[2] >= 3:
            with open('outfile.csv', 'wb') as f:
                csv.writer(f).writerows(i)
4

1 回答 1

1

好吧,主要问题的答案(有一个很好的理由为什么你应该将你的问题限制在一个问题上)非常简单——你所要做的就是改造这一点代码:

    for v in nbrs2:
        vnbrs = set(B[v])
        common = unbrs & vnbrs
        weight = len(common)
        row = u, v, weight
        folded.append(row)

变成这样的东西:

    for v in nbrs2:
        vnbrs = set(B[v])
        common = unbrs & vnbrs
        weight = len(common)
        row = u, v, weight
        f = open('outfile.csv', 'a')
        f.write(row)
        f.close()

当然,您必须相应地格式化行,并且您可能不需要为每一行打开和关闭文件句柄,但是使用这种方法,您不必在内存中建立大量不需要的数据.

于 2014-08-21T02:06:13.587 回答