我正在尝试对我们的设备生成的大量 .CSV 文件进行分类,但有分类部分的库存。每个文件由 30 多列组成,并且可以有无限数量的行。我需要实现的是如何检查同时发生在多列中多行的事件。例如,我需要检查是否有任何结果:
- 在“Test_Res_1”列中,连续 15 个睾丸的值小于 12
- 在“Test_Res_2”列中,连续 10 个睾丸的值小于 5
- 在“Test_Div”列中,连续 20 个睾丸的值小于 15
- 在“Test_time”列中,连续 10 个睾丸的值小于 60
- ......................一些连续测试的其他条件......
然后,如果满足任何或几个条件,我只会将该文件的名称写入 .txt 文件。我实现了该论坛用户建议的代码,并且脚本运行良好。但我只是复制一个块,每次我想检查另一个条件时执行检查。我确信有更好的方法来实现该代码并减少我目前拥有的巨大的票据。
以下是该文件的示例:
我尝试了在该论坛上找到的几个建议,但没有一个有效。他们中的一些人在一个条件下工作,但我需要检查我提到的几个条件。如果满足条件,我知道如何打开文件并将它们保存到 .txt,但我只是不知道如何检查多列和多行中的多个条件。检查一行很容易,但是检查其中的几个给我带来了很大的麻烦。
import os, os.path, zipfile, csv, datetime
import smtplib, os
f = open("test.txt", "w")
flagtotal=[]
path="datafiles/" # insert the path to the directory of interest
dirList=os.listdir(path)
for filename in dirList:
if filename.endswith((".csv")):
file=os.path.splitext(filename)
reader = csv.reader(open(filename))
# I GOT STOCK HERE!!!! Although the code seems to work just fine. I create a completely new instance for reader every time I want to add new condition. reader.next() # skip header row GROUP_SIZE = 5 THRESHOLD = 0.5 cond_deque = deque(maxlen=GROUP_SIZE) # *maxlen* requires Python version 2.6+ linenum = 0 while len(cond_deque) < GROUP_SIZE-1: try: row = reader.next() linenum += 1 col0, col1, col4, col5, col6, col23, col24, col25 = ( float(row[i]) for i in (0, 1, 4, 5, 6, 23, 24, 25)) cond_deque.append(col1 < THRESHOLD) except StopIteration: print 'less that {} rows of data in file'.format(GROUP_SIZE) break # then process any remaining lines for row in reader: col0, col1, col4, col5, col6, col23, col24, col25 = ( float(row[i]) for i in (0, 1, 4, 5, 6, 23, 24, 25)) linenum += 1 cond_deque.append(col1 < THRESHOLD) if cond_deque.count(True) == GROUP_SIZE: str1 = 'Condition 1 in cycles {}-{} had {} consecutive cycles < {}'.format( linenum-GROUP_SIZE+1, linenum, GROUP_SIZE, THRESHOLD) #print str1 flag.append(str1) break # stop looking #checking for the second condition reader = csv.reader(open('processed_data/'+filename)) reader.next() GROUP_SIZE = 2 THRESHOLD = 20 cond_deque = deque(maxlen=GROUP_SIZE) # *maxlen* requires Python version 2.6+ linenum = 0 while len(cond_deque) < GROUP_SIZE-1: try: row = reader.next() linenum += 1 col0, col1, col4, col5, col6, col23, col24, col25 = ( float(row[i]) for i in (0, 1, 4, 5, 6, 23, 24, 25)) cond_deque.append(col1 < THRESHOLD) except StopIteration: #print 'less that {} rows of data in file'.format(GROUP_SIZE) break # then process any remaining lines for row in reader: col0, col1, col4, col5, col6, col23, col24, col25 = ( float(row[i]) for i in (0, 1, 4, 5, 6, 23, 24, 25)) linenum += 1 cond_deque.append(col5 < THRESHOLD/60) if cond_deque.count(True) == GROUP_SIZE: str1 = 'Condition 2 {}-{} had {} consecutive cycles < {} minutes'.format( linenum-GROUP_SIZE+1, linenum, GROUP_SIZE, THRESHOLD) #print str1 flag.append(str1) break # stop looking
today = datetime.date.today()
datestring='Date of testing: '+today.strftime('%m/%d/%Y')
if len(flagtotal)>0:
flagtotal.insert(0,datestring)
flagtotal.insert(1,'The following files met the criteria.\n--------------------------------------------')
f.write("\n".join(map(lambda x: str(x), flagtotal)))
f.close()