我已经为我的目标构建了一个正则表达式模式:一个包含从 CSV 文件中获取的数据的字符串。我是一个几乎完全是编程新手,但我真的被困在这一步,我努力解决这个问题,因为正则表达式是(我认为......)我的问题的最佳选择,是从 CSV 文件中搜索数据,它们之间存在一些差异,但具有遵循正式协议的模式(MIAME 文件,来自生物信息学领域)。这是我的代码
import re
ficheiro=open(raw_input('write the name of the file (formato CSV):'), 'r')
lista_file=ficheiro.readlines()
str_file=str(lista_file)
list_spr=[]
value_spr=[]
for a in str_file:
regex_spr = re.search(r"(spr[0-9]{4})[^\t.]*\t([0-9.]+)", a, re.I|re.M)
print regex_spr.group()
list_spr +=regex_spr.group(1)
value_spr +=regex_spr.group(2)
但结果总是带有'NoneType'
,比如
Traceback (most recent call last):
File "C:\EDPython27\test\put_words_in_dict.py", line 112, in <module>
print regex_spr.group()
AttributeError: 'NoneType' object has no attribute 'group'
接下来是我用来测试模式的一些 str_file 范围:
('Reporter Identifier\tVALUE\n', 'spr0320060100000320\t4.784064198\n', 'spr0963060100000963\t3.646246197\n', 'spr1586060100001584\t5.755770215\n', 'spr1102060100001101\t5.794439261\n', 'spr1727060100001725\t6.452100774\n', 'spr0552060100000552\t6.816527711\n', 'spr0807060100000807\t3.185267941\n', 'spr0322060100000322\t5.889496662\n', 'spr0971060100000971\t3.112604228\n', 'spr0490060100000490\t6.608164616\n', 'spr0471060100000471\t6.807244139\n', 'spr0321060100000321\t5.331036948\n', 'spr1070060100001069\t6.408937689\n', 'spr1585060100001583\t6.157044216\n', 'spr1189060100001188\t3.481847857\n', 'spr1191060100001190\t3.523784616\n', 'spr1081060100001080\t6.708517655\n', 'spr1071060100001070\t7.092586967\n', 'spr1101060100001100\t6.294650154\n', 'spr0561060100000561\t7.52495517\n', 'spr0802060100000802\t8.299020685\n', 'spr1195060100001194\t6.143485258\n', 'spr0470060100000470\t5.869271803\n', 'spr1944060100001941\t7.060765363\n', 'spr0968060100000968\t6.276636704\n', 'spr1072060100001071\t7.267895537\n', 'spr0972060100000972\t5.535911422\n', 'spr1821060100001819\t7.660640949\n', 'spr0316060100000316\t6.399083059\n', 'spr0129060100000129\t6.693897057\n', 'spr0966060100000966\t6.208969299\n', 'spr0323060100000323\t6.230187159\n', 'spr1466060100001465\t7.609506586\n', 'spr0964060100000964\t6.286528191\n', 'spr1665060100001663\t5.597969101\n', 'spr0969060100000969\t5.122425278\n', 'spr1394060100001393\t7.310099682\n', 'spr0683060100000683\t7.397780719\n', 'spr1649060100001647\t6.121430945\n', 'spr0536060100000536\t7.936838283\n', 'spr1020060100001020\t7.339227818\n', 'spr0682060100000682\t7.435907739\n', 'spr0606060100000606\t6.251491879\n', 'spr0491060100000491\t5.400560984\n', 'spr0939060100000939\t6.928170725\n', 'spr1492060100001491\t7.451461913\n', 'spr0965060100000965\t5.610110186\n', 'spr1188060100001187\t3.384989187\n', 'spr1296060100001295\t5.927021756\n')
我提前感谢所有顾问。