0

我有一些正在读取 csv 文件的代码。然后代码根据用户输入添加值。我遇到的问题是删除包含无效值的条目。特定的无效值将是 M。我在 csv 文件中使用 M 来表示缺失。所以基本上我想做的是让用户输入开始和结束月份,然后让代码加起来降水值。但是,如果字符串应该包含一个 MI 并不想包含该行数据。例如...下面包含的部分示例。

Station,        Stat,       lat,      lon,      JAN,  FEB,  MAR,  APR,  MAY,  JUN,  JUL,  AUG  
Bainville 6 NE, 24-0408-06, 48.14065, -104.267, 0.10, 0.01, 0.12, 1.23, 0.02, 0.34, M,    0.00  
Brockton 20S,   24-1164-06, 47.5075,  -104.324, M,    0.08, 0.13, 1.54, 2.43, 1.23, 1.12, 0.9  
Cohagen,        24-1875-06, 47.564,   -106.7,   0.3,  0.37, M,    0.76, 1.55, 1.69, 0.35, 0.41  
Sidney,         24-7560-06, 47.36,    -104.47,  0.1,  0.21, 0.05, 1.21, M,    1.25, 2.75, 0.89

现在,如果用户要选择从 jan 到 mar 的月份,那么我希望发生的是 Brockten 行(jan)和 Cohagen 行(mar)被省略,因为值为 M。但是,如果用户选择了 apr 到可能的月份那么要省略的行将是 Sidney。

我希望这是有道理的。我知道这篇文章已经很长了,但我也会包含我的代码。

    ##################################################
import csv
import array
import decimal
decimal.getcontext().prec = 4
from time import gmtime, strftime
print strftime("%Y-%m-%d %H:%M:%S", gmtime())

# Create an unordered MON to column number dictionary and get user data
mdict = {'MAR': 11, 'FEB': 10, 'AUG': 16, 'SEP': 17, 'APR': 12, 'JUN': 14,
         'JUL': 15, 'JAN': 9, 'MAY': 13, 'NOV': 19, 'DEC': 20, 'OCT': 18}

month_start = raw_input('Input the 3 letter ID of beginning month: ')
month_end = raw_input('Input the 3 letter ID of ending month: ')
month_start = month_start.upper()
month_end = month_end.upper()
mon_layer_name = month_start + ' through ' +month_end
user_month = '[' + mon_layer_name + ']'
start_num = mdict[month_start]
end_num = mdict[month_end]+1
new_list = [['Station', 'Lat', 'Long', 'mysum']]

with open('R:\\COOP\\temp\\COOP_rainfall2.csv', 'rb') as csvfile:
    filereader = csv.reader(csvfile)
    filereader.next() # this is to skip header
    for row in filereader:
        #print row
        sta = row[0]
        lat = row[2]
        lon = row[3]
        tot = decimal.Decimal(0)
        for x in row[start_num:end_num]:
            print 'now in line 34 in code'
            if x == '': x = 0
            elif x == 'M': # I think this is where I need to do something just not sure how ot accomplish it.
                x = 0
                print row
            tot = tot + decimal.Decimal(x)
        if tot == 0: continue
        else: new_list.append([sta, lat, lon, str(tot)])

with open('R:\\COOP\\temp\\output.csv', 'wb') as csvout:        
    print 'Now in file writer code block'
    filewriter = csv.writer(csvout)
    for line in new_list:
          filewriter.writerow(line)

Rex = 'R:\\COOP\\temp\\output.csv'
Precip=[] #creating an array named Precip
inp = open (Rex,"r") 
for line in inp.readlines():
 line.split(',')
 Precip.append(line)
file.close(inp)
print 'End of code'

任何帮助是极大的赞赏。

4

2 回答 2

0

在您的代码运行之前,有一些问题需要解决。我不确定您要完成什么,但这里有一些建议:

对于您提供的示例数据,mdict 值不正确 - JAN 位于标题行的第 3 位。

理想情况下,您将通过解析 CSV 的第一行而不是将其丢弃来对查找进行程序化。你可以试试这个...

header_row = filereader.next() # if the months might NOT be uppercase, fix that here...
start_num = header_row.index(month_start.upper())
end_num = header_row.index(month_start.upper())

修复后,您可以检查列表中的“M”并使用 sum() 进行加法运算。由于我不确定您想如何处理“好”行,因此我将了解您可以在哪里寻找答案...

# first, see if 'M' is any of the values in the range you are looking for
if 'M' in row[start_num:end_num]:
    # skip this row
    pass
# if 'M' is not one of the values, do the math with a built in function instead of
#   writing the loop yourself (this uses a comprehension too... that is a replacement for your loop.
else:
    total = sum( (decimal.Decimal(x) for x in row[start_num:end_num]))

稍微澄清一下您的需求,我也许可以为您提供更多帮助...

汤姆

于 2013-04-26T18:58:55.543 回答
0

您可以使用如下代码确认某个单元格是否包含浮点数(有效)而不是字符:

def is_number(s):
try:
    float(s) # for int, long and float
    return true
except ValueError:
    return False

一旦识别出“问题”行,您可以通过拼接删除该行,代码类似于以下:

>>> print(a)
[[0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 'M', 3], [4, 4, 4], [5, 5, 5]]
>>> problemRow =3 # would really use is_number here to identify row 3
>>> a[0:problemRow]
[[0, 0, 0], [1, 1, 1], [2, 2, 2]]
>>> a[problemRow+1:] # nothing after the colon - extend to end
[[4, 4, 4], [5, 5, 5]]
>>> b = a[0:problemRow]
>>> b.append(a[problemRow+1:])
>>> print(b)
[[0, 0, 0], [1, 1, 1], [2, 2, 2], [[4, 4, 4], [5, 5, 5]]]
于 2013-04-26T20:36:17.877 回答