0

**My goal is to avoid importing the csv module

I am working on a script that runs through an extremely large csv file and selectively writes rows to a new csv file.

I have the following two lines:

with open(sys.argv[1]) as ifile, open(sys.argv[2], mode = 'w') as ofile:
    for row in ifile: 

and then this, a few nested-if statements down:

line = list(ifile)[row]
ofile.write(line)

I know that isn't right--I took a stab at it and was hoping someone here could shed some light on how to correctly go about this. The essence of this question is how to reference the row that I am in so that I can write it out to the new csv file using 'ofile'. Please let me know if any further clarifications are necessary. Thanks!

EDIT: Full Code included in pastebin link - http://pastebin.com/a0jx85xR

4

2 回答 2

0

只是为了添加到 jrd1 的答案中。我很少使用 csv 模块,我只是在字符串上使用 split 和 join 方法。通常我会得到这样的结果(如果只有一个输入和输出,我通常只使用标准输入和标准输出)。

import sys as sys

for row in sys.stdin:
  fields = row.split(",") #Could be "\t" or whatever, default is whitespace

  #process fields in someway (0 based indexing)
  fields[0] = str(int(fields[0]) + 55) 
  fields[7] = new_date_format(fields[7])
  if(some_condition_is_met):
    print(",".join(fields))

当然,如果你的 csv 文件开始得到一些带有引号和内部逗号等的时髦条目,那么这种方法就不会那么有趣了

于 2013-10-28T03:41:15.147 回答
0

你很接近。这就是你所要做的:

with open(sys.argv[1]) as ifile, open(sys.argv[2], mode = 'w') as ofile:
    for row in ifile:

    #...
    #You've defined some_condition to be met (you will have to replace this for yourself)
    #E.g.: the number of entries in each row is greater than 5:
        if len([term for term in row.split('#') if term.strip() != '']) > 5:
            ofile.write(row)

更新:

要回答 OP 关于分割线的问题:

您通过提供分隔符在 Python 中分割一行。由于这是一个 CSV 文件,因此您用,. 例子:

如果这是一行(字符串):

0, 1, 2, 3, 4, 5

如果您申请:

line.split(',')

您将获得一份清单

['0', '1', '2', '3', '4', '5']

更新 2:

import sys

if __name__ == '__main__':
    ticker = sys.argv[3]
    allTypes = bool(int(sys.argv[4])) #argv[4] is a string, you have to convert it to an int, then to a bool

    with open(sys.argv[1]) as ifile, open(sys.argv[2], mode = 'w') as ofile:
        all_timestamps = [] #this is an empty list
        n_rows = 0
        for row in ifile:
            #This splits the line into constituent terms as described earlier
            #SAMPLE LINE:
            #A,1,12884902522,B,B,4900,AAIR,0.1046,28800,390,B,AARCA,
            #After applying this bit of code, the line should be split into this:
            #['A', '1', '12884902522', 'B', 'B', '4900', 'AAIR', '0.1046', '28800', '390', 'B', 'AARCA']
            #NOW, you can make comparisons against those terms. :)

            terms = [term for term in row.split(',') if term.strip() != '']
            current_timestamp = int(terms[2])

            #compare the current against the previous
            #starting from row 2: (index 1)
            if n_rows > 1:
                #Python uses circular indices, hence: -1 means the value at the last index
                #That is, the previous time_stamp. Now perform the comparison and do something if that criterion is met:
                if current_timestamp - all_timestamp[-1] >= 0:
                    pass #the pass keyword means to do nothing. You'll have to replace it with whatever code you want

            #increment n_rows every time:
            n_rows += 1

            #always append the current timestamp to all the time_stamps
            all_timestamps.append(current_timestamp)


            if (terms[6] == ticker):
                # add something to make sure chronological order hasn't been broken
                if (allTypes == 1):
                    ofile.write(row)
            #I don't know if this was a bad indent of not, but you should know
            #where this goes
            elif (terms[0] == "A" or terms[0] == "M" or terms[0] == "D"):
                print row
                ofile.write(row)

我原来的猜想是正确的。您没有将行拆分为 CSV 组件。因此,当您对行进行比较时,您没有得到正确的结果 - 因此,您没有得到任何输出。这现在应该可以工作(根据您的目标进行轻微修改)。:)

于 2013-10-28T02:24:29.430 回答