0

我已经有一段时间了,我认为向专家寻求建议对我来说是最有利的。我知道我不是以最好的方式写这个,而且我已经陷入了一个兔子洞并让自己感到困惑。

我有一个.csv。一堆,其实。那部分不是问题。

CSV 顶部的行并不是真正的 CSV 数据,但它确实包含一条重要信息,即数据对其有效的数据。对于某些类型的报告,它位于一条线上,而另一些则位于另一条线上。

我的数据从顶部开始,通常是 10 或 11,但我不能总是确定。我知道第一列总是有相同的信息(数据表的标题)。

我想从前面的行中提取报告日期,对于文件类型 A,执行 stuffA,对于文件 tpye B,执行 stuffB,然后将该行写入新文件。我在增加行时遇到问题,我不知道我做错了什么。

样本数据:

"Attribute ""OPSURVEYLEVEL2_O"" [Category = ""Retail v1""]"
Date exported: 2/16/13
Exported by user: William
Project: 
Classification: Online Retail v1
Report type: Attributes
Date range: from 12/14/12 to 12/14/12
"Filter OpSurvey Level 2(mine):  [ LEVEL:SENTENCE TYPE:KEYWORD {OPSURVEYLEVEL2_O:""gift certificate redemption"", OPSURVEYLEVEL2_O:""combine accounts"", OPSURVEYLEVEL2_O:""cancel account"", OPSURVEYLEVEL2_O:""saved project moved to purchased project"", OPSURVEYLEVEL2_O:""unlock account"", OPSURVEYLEVEL2_O:""affiliate promotions"", OPSURVEYLEVEL2_O:""print to store coupons"", OPSURVEYLEVEL2_O:""disclaimer not clear"", OPSURVEYLEVEL2_O:""prepaid issue"", OPSURVEYLEVEL2_O:""customer wants to use coupons for print to store"", OPSURVEYLEVEL2_O:""customer received someone else's order"", OPSURVEYLEVEL2_O:""hi-res images unavailable"", OPSURVEYLEVEL2_O:""how to re-order"", OPSURVEYLEVEL2_O:""missing items"", OPSURVEYLEVEL2_O:""missing envelopes: print to store"", OPSURVEYLEVEL2_O:""missing envelopes: mail order"", OPSURVEYLEVEL2_O:""group rooms"", OPSURVEYLEVEL2_O:""print to store"", OPSURVEYLEVEL2_O:""print to store coupons"", OPSURVEYLEVEL2_O:""publisher: card not available for print to store"", OPSURVEYLEVEL2_O:publisher}]"
Total: 905
OPSURVEYLEVEL2_O,Distinct Document,% of Document,Sentiment Score
PRINT TO STORE,297,32.82,-0.1
...

示例代码

#!/usr/bin/python

import csv, os, glob, sys, errno

path = '/path/to/Downloads'
for infile in glob.glob(os.path.join(path,'report_ATTRIBUTE_OP*.csv')):
    if 'OPSURVEYLEVEL2' in infile:
        prime_column = 'ops2'
    elif 'OPSURVEYLEVEL3' in infile:
        prime_column = 'ops3'
    else:
        sys.exit(errno.ENOENT)
    with open(infile, "r") as csvfile:
        reader = csv.reader(csvfile)
        report_date = 'DATE NOT FOUND'
        # import pdb; pdb.set_trace()
        for row in reader:
            foo = 0
            while foo < 1: 
                if row[0][0:].find('OPSURVEYLEVEL') == 0:
                    foo = 1
                if "Date range" in row:
                    report_date = row[0][-8:]
                break
            if foo >= 1:
                if row[0][0:].find('OPSURVEYLEVEL') == 0:
                    break
                if 'ops2' in prime_column:
                    dup_col = row[0]
                    row.insert(0,dup_col)
                    row.append(report_date)
                elif 'ops3' in prime_column:
                    row.append(report_date)
                with open('report_merge.csv', 'a') as outfile:
                    outfile.write(row)
            reader.next()
4

1 回答 1

0

我可以在这段代码中看到两个问题。

首先是代码不会在row. 该行:

if "Date range" in row:

... 应该:

if "Date range" in row[0]:

第二个是代码:

if row[0][0:].find('OPSURVEYLEVEL') == 0:
    break

... 在数据表的标题行之后跳出for循环,因为那是最近的封闭循环。我怀疑while在此代码的先前版本中的某个地方还有另一个。

if使用语句而不是whileand ,代码更简单(并且没有错误)if,如下所示:

    for row in reader:
        if foo < 1: 
            if row[0][0:].find('OPSURVEYLEVEL') == 0:
                foo = 1
            if "Date range" in row[0]:  # Changed this line
                print("found report date")
                report_date = row[0][-8:]
        else:
            print(row)
            if row[0][0:].find('OPSURVEYLEVEL') == 0:
                break
            if 'ops2' in prime_column:
                dup_col = row[0]
                row.insert(0,dup_col)
                row.append(report_date)
            elif 'ops3' in prime_column:
                row.append(report_date)
            with open('report_merge.csv', 'a') as outfile:
                outfile.write(','.join(row)+'\n')
于 2013-02-18T04:17:54.013 回答