windows - 在巨大的日志文件中搜索多个字符串

Question

Powershell 问题

目前我有 5-10 个日志文件，每个文件大约 20-25GB，需要搜索每个文件以检查 900 个不同的搜索参数是否匹配。我编写了一个基本的 powershell 脚本，它将在整个日志文件中搜索 1 个搜索参数。如果匹配，它会将结果转储到单独的文本文件中，问题是它非常慢。我想知道是否有一种方法可以通过一次搜索所有 900 个参数并且只查看一次日志来加快速度。即使只是改进脚本，任何帮助都会很好。

基本概述：

1 个 csv 文件，其中包含“项目”列下列出的所有 900 个项目 1 个日志文件 (.txt) 1 个结果文件 (.txt) 1 个 ps1 文件

这是我在 PS1 文件中的 powershell 下面的代码：

$search = filepath to csv file<br>
$log = "filepath to log file"<br>
$result = "file path to result text file"<br>
$list = import-csv $search <br>


foreach ($address in $list) {<br>
Get-Content $log | Select-String $address.item | add-content $result <br>

*"#"below is just for displaying a rudimentary counter of how far through searching it is <br>*
$i = $i + 1 <br>
echo $i <br>
}

score 0 · Accepted Answer

900个搜索词是相当大的一组。你可以通过使用正则表达式来减小它的大小吗？一个简单的解决方案是逐行读取文件并查找匹配项。设置一个包含搜索词的正则表达式或文字字符串的集合。像这样，

$terms = @("Keyword[12]", "KeywordA", "KeyphraseOne") # Array of regexps
$src = "path-to-some-huge-file" # Path to the file
$reader = new-object IO.StreamReader($src) # Stream reader to file

while(($line = $reader.ReadLine()) -ne $null){ # Read one row at a time

    foreach($t in $terms) { # For each search term...
        if($line -match $t) { # check if the line read is a match...
            $("Hit: {0} ({1})" -f $line, $t) # and print match
        }
    }
}
$reader.Close() # Close the reader

score 0 · Accepted Answer

当然，这对于您使用的任何解析器来说都是非常痛苦的，只是基于您在那里的文件大小，但是如果您的日志文件是标准格式（例如 IIS 日志文件），那么您可以考虑使用日志解析诸如 Log Parser Studio 之类的应用程序而不是 Powershell？

windows - 在巨大的日志文件中搜索多个字符串

2 回答 2

Related

Reference