0

我有一个包含数据行的文件。每一行都以一个 id 开头,然后是用逗号分隔的一组固定属性。

123,2,kent,...,
123,2,bob,...,
123,2,sarah,...,
123,8,may,...,

154,4,sheila,...,
154,4,jeff,...,

175,3,bob,...,

249,2,jack,...,
249,5,bob,...,
249,3,rose,...,

如果满足条件,我想获得一个属性。条件是如果 'bob' 出现在同一个 id 中,则获取后面的第二个属性的值。

For example:

id: 123
values returned: 2, 8

id: 249
values returned: 3

Java 有一个我可以使用的双循环,但我想在 Python 中尝试一下。任何建议都会很棒。

4

3 回答 3

1

我想出了一个(也许)更 Pythonic 的解决方案,它使用groupbydropwhile. 这种方法产生的结果与下面的方法相同,但我认为它更漂亮.. :) 标志、“curr_id”和类似的东西不是很pythonic,如果可能的话应该避免!

import csv
from itertools import groupby, dropwhile

goal = 'bob'
ids = {}

with open('my_data.csv') as ifile:
    reader = csv.reader(ifile)
    for key, rows in groupby(reader, key=lambda r: r[0]):
        matched_rows = list(dropwhile(lambda r: r[2] != goal, rows))
        if len(matched_rows) > 1:
            ids[key] = [row[1] for row in matched_rows[1:]]

print ids

(下面的第一个解决方案)

from collections import defaultdict
import csv

curr_id = None
found = False
goal = 'bob'
ids = defaultdict(list)

with open('my_data.csv') as ifile:
    for row in csv.reader(ifile):
        if row[0] != curr_id:
            found = False
            curr_id = row[0]
        if found:
            ids[curr_id].append(row[1])
        elif row[2] == goal:
            found = True

print dict(ids)

输出:

{'123': ['2', '8'], '249': ['3']}
于 2014-02-20T06:00:18.913 回答
0

只需在循环时设置一个标志或其他东西:

name = 'bob'
id = '123'
found = False

for line in file:
    l = line.split(',')
    if l[0] == id:
        if l[2] == name:
            found = True
        if found:
            print l[1]
于 2014-02-20T05:37:23.717 回答
0
import csv, collections as co, cStringIO as StringIO

s = '''123,2,kent,...,
123,2,bob,...,
123,2,sarah,...,
123,8,may,...,
154,4,sheila,...,
154,4,jeff,...,
175,3,bob,...,
249,2,jack,...,
249,5,bob,...,
249,3,rose,...,'''

filelikeobject = StringIO.StringIO(s)
dd = co.defaultdict(list)
cr = csv.reader(filelikeobject)
for line in cr:
  if line[2] == 'bob':
    dd[line[0]]; continue
  if line[0] in dd:
    dd[line[0]].append(line[1])

结果:

>>> dd
defaultdict(<type 'list'>, {'175': [], '123': ['2', '8'], '249': ['3']})
于 2014-02-20T05:58:45.530 回答