regex - Powershell3：识别并显示 ascii 文件中的最后 n 行

Question

我认为这应该很简单。我将日志输出写入xcopy纯文本文件，并 "++++++++++++++++++++Tue 07/03/2018 0900 PM"在每次每日备份之前将每日分隔符（字面意思）附加到日志文件中。所以文件中的最后几行通常如下所示：

新的一天会附加一个新的分隔线，依此类推。

我想显示 LAST 分隔符及其后面的行eof。

我尝试过的架构GET-Content, Select-String -Context 0,20不起作用，

PS说我的搜索字符串++++++++++++++++++++++不是正则表达式，无法识别路径等。有什么帮助吗？

记忆和时间不是问题。对不起，如果这太简单了。

score 2 · Accepted Answer

msjqu 的有用答案解释了转义+字符的必要性。就像\+在正则表达式中为了这些字符一样。被视为文字。

因此，匹配标题行的正则表达式 - 20 个+字符。在行首 ( ^) - 是：^\+{20}

也就是说，如果通过 20 个+符号检测标题行就足够了Get-Content -Delimiter——它仅支持文字作为分隔符——提供了一个简单而有效的解决方案（PSv3+；假设some.log当前目录中的输入文件./）：

 $headerPrefix = '+' * 20  # -> '++++++++++++++++++++'
 $headerPrefix + (Get-Content ./some.log -Delimiter $headerPrefix -Tail 1)

-Delimiter使用指定的标题行签名将文件分成“行”（分隔符实例之间的文本，这里是行块）并通过从文件末尾-Tail 1搜索返回最后一个“行”（块）. ^向^{mjsqu 致敬}^{，帮助我找到了这个解决方案。}

以下替代解决方案是基于正则表达式的，它支持更复杂的标题行匹配。

注意：虽然以下解决方案都不需要将日志文件作为一个整体读取到内存中，但它们确实会读取整个文件，而不仅仅是从末尾读取。

我们可以在switch -regex -file语句中使用它来处理日志文件的所有行，以便收集以最后一个 ^\+{20}匹配项开头和之后的行；该代码假定输入文件路径./some.log：

# Process all lines in the log file and 
# collect each block's lines along the way in 
# array $lastBlockLines, which means that after 
# all lines have been processed, $lastBlockLines contains
# the *last* block's lines.
switch -regex -file ./some.log {
  '^\+{20}' { $lastBlockLines = @($_) } # start of new block, (re)initialize array
  default   { $lastBlockLines += $_ }   # add line to block
}

# Output the last block's lines.
$lastBlockLines

或者，如果您愿意假设 block 中的最大行数是固定的，则可以使用单管道解决方案Select-String：

Select-String '^\+{20}' ./some.log -Context 0,100 | Select-Object -Last 1 | 
  ForEach-Object { $_.Line; $_.Context.PostContext }

Select-String '^\+{20}' ./some.log -Context 0,100匹配文件中的所有标题行，./some.log并且由于-Context 0, 100，在发出的匹配对象中包含（最多）100 行匹配行之后的行（这0意味着不包含匹配行之前的行）。
Select-Object -Last 1只通过最后一场比赛。
ForEach-Object { $_.Line; $_.Context.PostContext }然后输出最后一个匹配的匹配行以及它后面的最多 100 行。

如果您不介意阅读该文件两次，您可以Select-String结合Get-Content ... | Select-Object -Skip：

Get-Content ./some.log | Select-Object -Skip (
    (Select-String '^\+{20}' ./some.log | Select-Object -Last 1).LineNumber - 1
  )

这利用了这样一个事实，即由发出的匹配对象Select-String具有.LineNumber反映找到给定匹配的行号的属性。将最后一个匹配的行号减去 1 以Get-Content ... | Select-Object -Skip输出匹配的行以及所有后续行。

score 1 · Accepted Answer

TLDR；在搜索中转义+，使用 "\+\+\+" 等。

背景

不幸+的是，它是正则表达式世界中的一个保留字符。

正则表达式中 + 的含义是什么？

它告诉引擎匹配前一个搜索运算符（一个字符、范围或代表一组字符的代码，如 \d - 数字）一次或多次。您可以通过运行以下命令在 Powershell 中查看有关此错误的更多信息：

[regex]$x = "++++"

回报：

Cannot convert value "++++" to type "System.Text.RegularExpressions.Regex". Error: "parsing "++++" - Quantifier {x,y} following nothing."
At line:1 char:1
+ [regex]$x = "++++"
+ ~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : MetadataError: (:) [], ArgumentTransformationMetadataException
    + FullyQualifiedErrorId : RuntimeException

它是说量词 ( +) 没有跟随。

所以我们需要逃避+using \：

[regex]$x = "\+\+\+\+"

$x.Match('++++')

返回以下内容，一个无错误的匹配：

Groups   : {0}
Success  : True
Name     : 0
Captures : {0}
Index    : 0
Length   : 4
Value    : ++++

改进

如果你知道有多少+个，你可以匹配 on "\+{20}"，如果有 20 个。或者来自前面的例子：

[regex]$x = "\+{4}"

$x.Match('++++')

score 1 · Accepted Answer

另一种使用 RegEx 将文件拆分为多个部分的方法。

Get-Content与参数一起使用-Raw以获得一个字符串，而不是字符串数组
使用非消耗性正向前瞻将文件拆分为以
20*+ 开头-split '(?=\+{20})'且不为空的部分-ne ''
使用 index[-1]获取最后一部分。

样本输出

PS> ((Get-Content '.\LogFile.txt' -raw) -split '(?=\+{20})' -ne '')[-1]
++++++++++++++++++++Mon 07/03/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups Mon 07/02/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups\OutlookBak Mon 07/02/2018 0900 PM

score 0 · Accepted Answer

就个人而言，我会更改该日志记录格式，使其对对象更加友好并正常使用。

但是，根据您发布的内容。这是解决这个问题的一种方法，我相信还有更优雅的方法，但这是 q&d（快速而肮脏）。此外，作为一名军事兽医（20 多年）并且仍然在军事时间生活和工作，0900 是上午 9:00，其中 2100 是晚上 9:00。8^} ...只是说...</p>

# Get the lines in the file
($DataSet = Get-Content -Path '.\LogFile.txt')

# Results

++++++++++++++++++++Mon 07/02/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups Mon 07/02/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups\OutlookBak Mon 07/02/2018 0900 PM
++++++++++++++++++++Mon 07/03/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups Mon 07/02/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups\OutlookBak Mon 07/02/2018 0900 PM



 # Get the index of the LastDateEntry, using a string match (RegEx)
($LastDateEntry = (Get-Content -Path '.\LogFile.txt' | %{$_ | Select-String -Pattern '[+].*'}) | Select -Last 1)

# Results

++++++++++++++++++++Mon 07/03/2018 0900 PM


# Get the LastDateEntryIndex
($DateIndex = (Get-Content -Path '.\LogFile.txt').IndexOf($LastDateEntry))

# Results

5



 # Get the data using the index
ForEach($Line in $DataSet)
{
    If ($Line.ReadCount -ge $DateIndex)
    {
    Get-Content -Path '.\LogFile.txt' | Select-Object -Index ($Line.ReadCount)
    }
}

# Results

++++++++++++++++++++Mon 07/03/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups Mon 07/02/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups\OutlookBak Mon 07/02/2018 0900 PM

regex - Powershell3：识别并显示 ascii 文件中的最后 n 行

4 回答 4

背景

改进

Related

Reference