python - 从文件中提取日期和货币值（用逗号分隔）

Question

客观的：

提取字符串数据、货币值、[货币类型]和日期。

文件内容：

[["1234567890","Your previous month subscription point is <RS|$|QR|#> 5,200.33.Your current month month subscription point is <RS|$|QR|#> 1,15,200.33, Last Year total point earned <RS|$|QR|#> 5589965.26 and point lost in game is <RS|$|QR|#> 11520 your this year subscription will expire on 19-04-2013. 9. Back"],["1234567890","Your previous month subscription point is <RS|$|QR|#> 5,200.33.Your current month month subscription point is <RS|$|QR|#> 1,15,200.33, Last Year total point earned <RS|$|QR|#> 5589965.26 and point lost in game is <RS|$|QR|#> 11520 your this year subscription will expire on 19-04-2013. 9. Back"]]

到目前为止我做了什么：

def read_file():
        fp = open('D:\\ReadData2.txt', 'rb')
        content = fp.read()
        data = eval(content)  
        l1 = ["%s" % x[1] for x in data]
        return l1

    def check_currency(l2):
        import re
        for i in range(l2.__len__()):
            newstr2  = l2[i]
            val_currency = []
            val_currency.extend(re.findall(r'([+-]?\d+(?:\,\d+)*?\d+(?:\.\d+)?)',newstr2))
            print " List %s " %  val_currency
            for i in range(len(val_currency)):
                val2 =  val_currency[i]
                remove_commas = re.compile(r',(?=\d+)*?')
                val3 = remove_commas.sub('', val2)
                print val3              

     if __name__=="__main__":main()

编辑 UDP 我能够提取货币值，但 -ve 值的货币与日期格式（dd-mm-yyyy）冲突。并且在提取字符串值期间，它还提取 [.|,|] 如何不读取这些字符。

check_currency 的输出：

>List ['5,200.33', '1,15,200.33', '5589965.26', '11520', '19', '-04', '-2013'] 
>5200.33
>115200.33
>5589965.26
>11520
>19
>-04
>-2013

check_currency 的预期输出：

>List ['5,200.33', '1,15,200.33', '5589965.26', '11520'] 
        >5200.33
        >115200.33
        >5589965.26
        >11520

score 0 · Accepted Answer

我<RS|$|QR|#>\s*在正则表达式的第一部分添加了它，以便用作您要匹配的货币值的前缀。

您可以将代码更改为此：

def check_currency(l2):
import re
for i in range(l2.__len__()):
    newstr2  = l2[i]
    val_currency = []
    val_currency.extend(re.findall(r'<RS|$|QR|#>\s*([+-]?\d+(?:\,\d+)*?\d+(?:\.\d+)?)',newstr2))
    # skip empty strings and remove comma characters
    val_currency = [v.replace(',', '') for v in val_currency if v]
    print " List %s " %  val_currency$                                                            
    for i in range(len(val_currency)):
        val2 =  val_currency[i]
        remove_commas = re.compile(r',(?=\d+)*?')
        val3 = remove_commas.sub('', val2)
        print val3

输出：

List ['5200.33', '115200.33', '5589965.26', '11520']
5200.33
115200.33
5589965.26
11520

代码中的补充：

val_currency.extend(re.findall(r'<RS|$|QR|#>\s*([+-]?\d+(?:\,\d+)*?\d+(?:\.\d+)?)',newstr2))
val_currency = [v.replace(',', '') for v in val_currency if v]

python - 从文件中提取日期和货币值（用逗号分隔）

1 回答 1

Related

Reference