python - 使用 Python 读取多个文件时，如何搜索错误字符串的重复出现？

Question

我刚刚开始使用 Python，并且正在尝试对我的环境进行一些测试……这个想法是尝试创建一个简单的脚本来查找在给定时间段内重复出现的错误。

基本上我想在我的日常日志中计算服务器失败的次数，如果在给定的时间段（比如 30 天）内失败发生的次数超过给定的次数（比如 10 次），我应该能够在日志上发出警报，但是，我并不想仅仅计算 30 天间隔内的错误重复次数......我真正想做的是计算错误发生、恢复和它们再次发生，这样如果问题持续数天，我将避免多次报告。

例如，假设：

file_2016_Oct_01.txt@hostname@YES
file_2016_Oct_02.txt@hostname@YES
file_2016_Oct_03.txt@hostname@NO
file_2016_Oct_04.txt@hostname@NO
file_2016_Oct_05.txt@hostname@YES
file_2016_Oct_06.txt@hostname@NO
file_2016_Oct_07.txt@hostname@NO

给出上面的场景，我希望脚本将其解释为 2 次失败而不是 4 次，因为有时服务器可能会在恢复前几天呈现相同的状态，并且我希望能够识别问题的再次发生而不是仅仅计算失败总数。

作为记录，这就是我浏览文件的方式：

# Creates an empty list
history_list = []

# Function to find the files from the last 30 days

def f_findfiles():
    # First define the cut-off day, which means the last number 
    # of days which the scritp will consider for the analysis
    cut_off_day = datetime.datetime.now() - datetime.timedelta(days=30)

    # We'll now loop through all history files from the last 30 days
    for file in glob.iglob("/opt/hc/*.txt"):
        filetime = datetime.datetime.fromtimestamp(os.path.getmtime(file))
        if filetime > cut_off_day:
            history_list.append(file)

# Just included the function below to show how I'm going 
# through the files, this is where I got stuck...

def f_openfiles(arg):
    for file in arg:
        with open(file, "r") as file:
            for line in file:
                clean_line = line.strip().split("@")

# Main function
def main():
    f_findfiles()
    f_openfiles(history_list)

我正在使用“with”打开文件并从“for”中的所有文件中读取所有行，但我不确定如何浏览数据以将与一个文件相关的值与旧文件进行比较.

我尝试将所有数据放入字典、列表或只是枚举和比较，但我在所有这些方法上都失败了:-(

关于这里最好的方法的任何提示？谢谢！

score 0 · Accepted Answer

我最好使用 shell 实用程序（即 uniq）来处理这种情况，但是，只要您喜欢使用 python：

只需最少的努力，您就可以处理它创建适当dict的对象，并以 stings（如 'file_2016_Oct_01.txt@hostname@YES'）作为键。遍历日志，您将检查字典中是否存在相应的键（使用if 'file_2016_Oct_01.txt@hostname@YES' in my_log_dict），然后适当地分配或增加 dict 值。

一个简短的示例：

data_log = {}

lookup_string = 'foobar'
if lookup_string in data_log:
    data_log[lookup_string] += 1
else:
    data_log[lookup_string] = 1

或者（单行，但大多数时候在 python 中看起来很难看，我已经编辑它以使用换行符可见）：

data_log[lookup_string] = data_log[lookup_string] + 1 \
    if lookup_string in data_log \
    else 1

python - 使用 Python 读取多个文件时，如何搜索错误字符串的重复出现？

1 回答 1

Related

Reference