客观的:
提取字符串数据、货币值、[货币类型]和日期。
文件内容:
[["1234567890","Your previous month subscription point is <RS|$|QR|#> 5,200.33.Your current month month subscription point is <RS|$|QR|#> 1,15,200.33, Last Year total point earned <RS|$|QR|#> 5589965.26 and point lost in game is <RS|$|QR|#> 11520 your this year subscription will expire on 19-04-2013. 9. Back"],["1234567890","Your previous month subscription point is <RS|$|QR|#> 5,200.33.Your current month month subscription point is <RS|$|QR|#> 1,15,200.33, Last Year total point earned <RS|$|QR|#> 5589965.26 and point lost in game is <RS|$|QR|#> 11520 your this year subscription will expire on 19-04-2013. 9. Back"]]
到目前为止我做了什么:
def read_file():
fp = open('D:\\ReadData2.txt', 'rb')
content = fp.read()
data = eval(content)
l1 = ["%s" % x[1] for x in data]
return l1
def check_currency(l2):
import re
for i in range(l2.__len__()):
newstr2 = l2[i]
val_currency = []
val_currency.extend(re.findall(r'([+-]?\d+(?:\,\d+)*?\d+(?:\.\d+)?)',newstr2))
print " List %s " % val_currency
for i in range(len(val_currency)):
val2 = val_currency[i]
remove_commas = re.compile(r',(?=\d+)*?')
val3 = remove_commas.sub('', val2)
print val3
if __name__=="__main__":main()
编辑 UDP 我能够提取货币值,但 -ve 值的货币与日期格式(dd-mm-yyyy)冲突。并且在提取字符串值期间,它还提取 [.|,|] 如何不读取这些字符。
check_currency 的输出:
>List ['5,200.33', '1,15,200.33', '5589965.26', '11520', '19', '-04', '-2013']
>5200.33
>115200.33
>5589965.26
>11520
>19
>-04
>-2013
check_currency 的预期输出:
>List ['5,200.33', '1,15,200.33', '5589965.26', '11520']
>5200.33
>115200.33
>5589965.26
>11520