1

我已经为我的目标构建了一个正则表达式模式:一个包含从 CSV 文件中获取的数据的字符串。我是一个几乎完全是编程新手,但我真的被困在这一步,我努力解决这个问题,因为正则表达式是(我认为......)我的问题的最佳选择,是从 CSV 文件中搜索数据,它们之间存在一些差异,但具有遵循正式协议的模式(MIAME 文件,来自生物信息学领域)。这是我的代码

import re
    ficheiro=open(raw_input('write the name of the file (formato CSV):'), 'r')
    lista_file=ficheiro.readlines()
    str_file=str(lista_file)
    list_spr=[]
    value_spr=[]
    for a in str_file:
        regex_spr = re.search(r"(spr[0-9]{4})[^\t.]*\t([0-9.]+)", a, re.I|re.M)
        print regex_spr.group()
        list_spr +=regex_spr.group(1)
        value_spr +=regex_spr.group(2)

但结果总是带有'NoneType',比如

Traceback (most recent call last):
  File "C:\EDPython27\test\put_words_in_dict.py", line 112, in <module>
    print regex_spr.group()
AttributeError: 'NoneType' object has no attribute 'group'

接下来是我用来测试模式的一些 str_file 范围:

('Reporter Identifier\tVALUE\n', 'spr0320060100000320\t4.784064198\n', 'spr0963060100000963\t3.646246197\n', 'spr1586060100001584\t5.755770215\n', 'spr1102060100001101\t5.794439261\n', 'spr1727060100001725\t6.452100774\n', 'spr0552060100000552\t6.816527711\n', 'spr0807060100000807\t3.185267941\n', 'spr0322060100000322\t5.889496662\n', 'spr0971060100000971\t3.112604228\n', 'spr0490060100000490\t6.608164616\n', 'spr0471060100000471\t6.807244139\n', 'spr0321060100000321\t5.331036948\n', 'spr1070060100001069\t6.408937689\n', 'spr1585060100001583\t6.157044216\n', 'spr1189060100001188\t3.481847857\n', 'spr1191060100001190\t3.523784616\n', 'spr1081060100001080\t6.708517655\n', 'spr1071060100001070\t7.092586967\n', 'spr1101060100001100\t6.294650154\n', 'spr0561060100000561\t7.52495517\n', 'spr0802060100000802\t8.299020685\n', 'spr1195060100001194\t6.143485258\n', 'spr0470060100000470\t5.869271803\n', 'spr1944060100001941\t7.060765363\n', 'spr0968060100000968\t6.276636704\n', 'spr1072060100001071\t7.267895537\n', 'spr0972060100000972\t5.535911422\n', 'spr1821060100001819\t7.660640949\n', 'spr0316060100000316\t6.399083059\n', 'spr0129060100000129\t6.693897057\n', 'spr0966060100000966\t6.208969299\n', 'spr0323060100000323\t6.230187159\n', 'spr1466060100001465\t7.609506586\n', 'spr0964060100000964\t6.286528191\n', 'spr1665060100001663\t5.597969101\n', 'spr0969060100000969\t5.122425278\n', 'spr1394060100001393\t7.310099682\n', 'spr0683060100000683\t7.397780719\n', 'spr1649060100001647\t6.121430945\n', 'spr0536060100000536\t7.936838283\n', 'spr1020060100001020\t7.339227818\n', 'spr0682060100000682\t7.435907739\n', 'spr0606060100000606\t6.251491879\n', 'spr0491060100000491\t5.400560984\n', 'spr0939060100000939\t6.928170725\n', 'spr1492060100001491\t7.451461913\n', 'spr0965060100000965\t5.610110186\n', 'spr1188060100001187\t3.384989187\n', 'spr1296060100001295\t5.927021756\n')

我提前感谢所有顾问。

4

1 回答 1

1

docs开始re.search()

扫描字符串以查找正则表达式模式产生匹配的位置,并返回相应的 MatchObject 实例。如果字符串中没有位置与模式匹配,则返回None

因此,这里的解决方法是检查是否regex_spr 存在None

for a in str_file:
    regex_spr = re.search(r"(spr[0-9]{4})[^\t.]*\t([0-9.]+)", a, re.I|re.M)
    if regex_spr is not None:
        print regex_spr.group()
        list_spr +=regex_spr.group(1)
        value_spr +=regex_spr.group(2)
    else:
         #do something else
于 2013-01-20T03:24:19.103 回答