python - 在 Python 中查找和替换逻辑

Question

在 python 中，我需要一个逻辑来处理下面的场景，我对此使用了 split 函数。我有包含输入的字符串，如下所示。

“ID674021384 25/01/1986 heloo hi 感谢 5 分钟和 25-01-1988。”

“ID909900000 25-01-1986 你好 10 分钟。”

输出应如下所示，将日期格式替换为“日期”，将时间格式替换为“时间”。

“ID674021384 日期你好你好谢谢时间日期。”

“ID909900000 日期你好时间。”

而且我还需要计算每个 ID 的日期和时间，如下所示

ID674021384 日期：2 时间：1

ID909900000 日期：1 时间：1

score 2 · Accepted Answer

>>> import re
>>> from collections import defaultdict
>>> lines = ["ID674021384 25/01/1986 heloo hi thanks 5 minutes and 25-01-1988.", "ID909900000 25-01-1986 hello 10 minutes."]
>>> pattern = '(?P<date>\d{1,2}[/-]\d{1,2}[/-]\d{4})|(?P<time>\d+ minutes)'
>>> num_occurences = {line:defaultdict(int) for line in lines}
>>> def repl(matchobj):
        num_occurences[matchobj.string][matchobj.lastgroup] += 1
        return matchobj.lastgroup

>>> for line in lines:
        text_id = line.split(' ')[0]
        new_text = re.sub(pattern,repl,line)    
        print new_text
        print '{0} DATE:{1[date]} Time:{1[time]}'.format(text_id, num_occurences[line])
        print ''


ID674021384 date heloo hi thanks time and date.
ID674021384 DATE:2 Time:1

ID909900000 date hello time.
ID909900000 DATE:1 Time:1

score 1 · Accepted Answer

为了解析类似的文本行，比如日志文件，我经常使用该re模块使用正则表达式。虽然split()也可以很好地分隔不包含空格和日期部分的字段，但使用正则表达式还可以确保格式符合您的期望，并且如果需要警告您奇怪的输入行。

使用正则表达式，您可以获取日期和时间的各个字段，并从中构造date或datetime对象（均来自datetime模块）。拥有这些对象后，您可以将它们与其他类似对象进行比较并编写新条目，并根据需要格式化日期。我建议解析整个输入文件（假设您正在读取一个文件）并编写一个全新的输出文件，而不是尝试就地更改它。

至于跟踪日期和时间计数，当您的输入不是太大时，使用字典通常是最简单的方法。当您遇到具有特定 ID 的行时，在您的字典中找到与该 ID 对应的条目，如果没有，则添加一个新条目。这个条目本身可以是一个使用日期和时间作为键的字典，其值是每个遇到的计数。

我希望这个答案能指导您找到解决方案，即使它不包含任何代码。

score 0 · Accepted Answer

You could use a couple of regular expressions:

import re

txt = 'ID674021384 25/01/1986 heloo hi thanks 5 minutes and 25-01-1988.'

retime = re.compile('([0-9]+) *minutes')
redate = re.compile('([0-9]+[/-][0-9]+[/-][0-9]{4})')

# find all dates in 'txt'
dates = redate.findall(txt)
print dates

# find all times in 'txt'
times = retime.findall(txt)
print times

# replace dates and times in orignal string:
newtxt = txt
for adate in dates:
    newtxt = newtxt.replace(adate, 'date')

for atime in times:
    newtxt = newtxt.replace(atime, 'time')

The output looks like this:

Original string:
ID674021384 25/01/1986 heloo hi thanks 5 minutes and 25-01-1988.
Found dates:['25/01/1986', '25-01-1988']
Found times: ['5']

New string:
ID674021384 date heloo hi thanks time minutes and date.

Dates and times found:
ID674021384 DATE:2 TIME:1

Chris

python - 在 Python 中查找和替换逻辑

3 回答 3

Related

Reference