0

示例输入文件(实际输入文件包含大约 50,000 个条目):

615 146 
615 180 
615 53  
615 42  
615 52  
615 52  
615 51  
615 45  
615 49
616 34
616 44
616 42
616 41
616 42
617 42
617 43
617 42
685 33
685 33
685 33
686 33
686 33
687 47
687 68
737 449
737 41
737 1138
738 46
738 53  

我必须将列中的每个值与相同的值(如 615,615,615)进行比较,必须将集群组合在一起,集群必须包含 column1 值,如 146,180.....45,49,然后集群必须打破并形成下一组相同值 616,616,616 的另一个集群。 .........很快

我写的代码是:

from __future__ import division
from sys import exit
h = 0
historyjobs = []
targetjobs = []


def quickzh(zhlistsub,
    targetjobs=targetjobs,num=0,denom=0):

 li = [] ; ji = []
 j = 0
 for i in zhlistsub:
    x1 = targetjobs[j][0]

    x = targetjobs[i][0]

    num += x
    denom += 1
    if x1 >= 0.9 * (num/denom):#to group all items with same value in column 0 
      li.append(targetjobs[i][1])
    else:
      break     
 return li


 def filewr(listli):
 global h
 s = open("newout1","a")
 if(len(listli) != 0):
      h += 1
      s.write("cluster: %d"%h)
      s.write("\n")
      s.write(str(listli))
      s.write("\n\n")
 else:
      print "0"


def new(inputfile,
historyjobs=historyjobs,targetjobs=targetjobs):
zhlistsub = [];zhlist = []
k = 0 

with open(inputfile,'r') as f:
    for line in f:
        job = map(int,line.split())
        targetjobs.append(job)
    while True: 
     if len(targetjobs) != 0:

       zhlistsub = [i for i, element in enumerate(targetjobs)]

       if zhlistsub:
          listrun = quickzh(zhlistsub)
          filewr(listrun)
       historyjobs.append(targetjobs.pop(0))
       k += 1
     else:
         break

new('newfinal1')

我得到的输出是:

 cluster: 1
 [146, 180, 53, 42, 52, 52, 51, 45, 49, 34, 44, 42, 41, 42, 42, 43, 42, 33, 33, 33, 33, 33, 47, 68, 449, 41, 1138, 46, 53]

 cluster: 2
 [180, 53, 42, 52, 52, 51, 45, 49, 34, 44, 42, 41, 42, 42, 43, 42, 33, 33, 33, 33, 33, 47, 68, 449, 41, 1138, 46, 53]

 cluster: 3
 [53, 42, 52, 52, 51, 45, 49, 34, 44, 42, 41, 42, 42, 43, 42, 33, 33, 33, 33, 33, 47, 68, 449, 41, 1138, 46, 53]
 ..................so on

但我需要的输出是:

  cluster: 1
  [146, 180, 53, 42, 52, 52, 51, 45, 49]
  cluster: 2
  [34, 44, 42, 41, 42]
  cluster: 3
  [42, 43, 42]
  _____________________ so on

那么任何人都可以建议我应该对条件进行哪些更改以获得所需的结果。这真的很有帮助吗?

4

2 回答 2

1

尚未测试答案,但遵循此概念

import collections.defaultdict

cluster=defaultdict(list)

with open(inputfile,'r') as f:
    for line in f:
        clus, val = line.split()
        cluster[clus].append(val)

for clus, val in cluster:
    print "cluster" +str(clus)+"\n"
    print str(val)+"\n"
于 2013-09-27T03:32:15.277 回答
1

试试这个,groupby负责创建集群,剩下要做的就是构建列表:

import itertools as it
[[y[1] for y in x[1]] for x in it.groupby(data, key=lambda x:x[0])]

以上假设这data是您的输入所在的位置,并且它已经按第一列过滤和排序。对于问题中的示例,它看起来像这样:

data = [[615, 146], [615, 180], [615, 53] ... ]
于 2013-09-27T03:23:58.213 回答