xml - 从模式 A 到模式 B 的 Select-String

Question

有没有办法用来Select-String查找和之间的所有X行Y。

例如，如果我有一个包含内容的文件：

[line 157: Time 2015-08-04 11:34:00] 
<staff>
    <employee>
        <Name>Bob Smith</Name>
        <function>management</function>
        <age>39</age>
        <birthday>3rd June</birthday>
        <car>yes</car>
    </employee>
    <employee>
        <Name>Sam Jones</Name>
        <function>security</function>
        <age>24</age>
    </employee>
    <employee>
        <Name>Mark Perkins</Name>
        <function>management</function>
        <age>32</age>
    </employee>
</staff>

我想找到的所有内容< function >management< /function >，所以我最终会得到：

<employee>
    <Name>Bob Smith</Name>
    <function>management</function>
    <age>39</age>
    <birthday>3rd June</birthday>
    <car>yes</car>
</employee>
<employee>
    <Name>Mark Perkins</Name>
    <function>management</function>
    <age>32</age>
</employee>

如果所有分组的大小相同，我可以使用类似的东西：

Select-String -Pattern '<function>management</function>' -CaseSensitive -Context 2,2

但是，实际上它们的大小不会相同，所以我不能每次都使用固定的数字。

真的，我需要一种方式来表示返回所有内容：

2 rows above my search term
until
the following '</employee>' field

对于所有匹配的实例。

这可能吗？

我无法在 powershell 中使用标准 xml 工具，因为我正在阅读的文件不是标准 xml，因此我将[line 157: Time 2015-08-04 11:34:00]其作为示例。考虑它的最佳方式是大量的 xml 文件，全部合并到一个 xml 文件中，并通过[line . . .]标题将它们分解。

附加信息：我担心我的示例有点过于简单，实际文件更像：

[line 157: Time 2015-08-04 11:34:00]
<?xml version="1.0" encoding="utf-8"?>
<other>
    <stuff>
    . . .
    </stuff>
</other>

<?xml version="1.0" encoding="utf-8"?>
<staff>
    <employee>
    ...
    </employee>
</staff> 

<staff>
    <employee>
    ...
    </employee>
</staff>
[line End: Time 2015-08-04 11:34:00]

附加信息 我添加了代码以忽略这些< ?xml version. . .行。我还尝试添加我自己的根元素：

$first = "<open>"
$last = "</open>"
$a = 0

. . .

if($a -eq 0)
    {
        $XmlFiles[$Index] += $first
        $a++
    } 

. . .

$XmlFiles[$Index] += $last

但这给出了一个Array assignment failed because index '-1' was out of range.错误

附加信息 最终结果是这样的：

$FilePath = "C:\Path\To\XmlDocs.txt"
$XmlFiles = @()
$Index = -1

$first = "<open>"
$last = "</open>"

# Go through the file and store the individual xml documents in a string array
$a=0
Get-Content $FilePath | `
%{
    if($_ -match "^\[line\ \d+")
        {
            if($a -eq 0)
                {
                    #if this is the top line, ignore it
                }
            else
                {
                    #if this is a boundary, add a closing < /open > tag
                    $XmlFiles[$Index] += $last
                }
            # We've got a boundary, move to next index in array
            $Index++
            # Add a new string to hold the next xml document
            $XmlFiles += ""
            # Add an < open > tag
            $XmlFiles[$Index] += $first
            $a++
        } 
    elseif ($_ -match '^\<\?xml') #ignore xml headers
        {
            # End of Section, or XML Header. Do Nothing and move on
        }
    elseif([string]::IsNullOrEmpty($_))
        {
            # Blank Line, Do Nothing and move on
        }
    else 
        {
            # Add each line to the string (xml doesn't care about line breaks)
            $XmlFiles[$Index] += $_
        }
}

# add the final < /open > tag
$XmlFiles[$Index] += $last

$a=0
$Results = foreach($File in $XmlFiles)
{
    $Xml = [xml]($File.Trim())
    # Parse string as an Xml document
    $Xml = [xml]$File
    # Use Xpath to find the manager
    $Xml.SelectNodes("//employee[function = 'management']") |% {$_}
    $a++
}

$Results

它基本上忽略了标题[line. . .、xml 定义< ?xml和任何空行，并< open >. . . < /open >在每个部分周围添加了一个标签以使其有效。

score 1 · Accepted Answer

我认为您高估了将单个 Xml 文档解析为实际 XML 的挑战。您可以逐行阅读文件，并使用“[line ...]”字符串作为各个文档之间的边界：

$FilePath = "C:\Path\To\XmlDocs.txt"
$XmlFiles = @()
$Index = -1

# Go through the file and store the individual xml documents in a string array
Get-Content $FilePath |%{
    if($_ -match "^\[line\ \d+"){
        # We've got a boundary, move to next index in array
        $Index++
        # Add a new string to hold the next xml document
        $XmlFiles += ""
    } else {
        # Add each line to the string (xml doesn't care about line breaks)
        $XmlFiles[$Index] += $_
    }
}

$Managers = foreach($File in $XmlFiles){
    # Parse string as an Xml document
    $Xml = [xml]$File
    # Use Xpath to find the manager
    $Xml.SelectNodes("//employee[function = 'management']") |% {$_}
}

使用这样的示例文件（示例的修改/扩展版本）：

[line 157: Time 2015-08-04 11:34:00] 
<staff>
    <employee>
        <Name>Bob Smith</Name>
        <function>management</function>
        <age>39</age>
        <birthday>3rd June</birthday>
        <car>yes</car>
    </employee>
    <employee>
        <Name>Sam Jones</Name>
        <function>security</function>
        <age>24</age>
    </employee>
    <employee>
        <Name>Mark Perkins</Name>
        <function>management</function>
        <age>32</age>
    </employee>
</staff>
[line 158: Time 2015-08-06 12:36:30] 
<staff>
    <employee>
        <Name>Rob Smith</Name>
        <function>management</function>
        <age>39</age>
        <birthday>3rd June</birthday>
        <car>yes</car>
    </employee>
    <employee>
        <Name>Cam Jones</Name>
        <function>security</function>
        <age>24</age>
    </employee>
    <employee>
        <Name>Stark Perkins</Name>
        <function>management</function>
        <age>32</age>
    </employee>
</staff>

结果$Managers将是：

PS C:\> $Managers|Select Name,function,age

Name                               function                          age
----                               --------                          ---
Bob Smith                          management                        39
Mark Perkins                       management                        32
Rob Smith                          management                        39
Stark Perkins                      management                        32

xml - 从模式 A 到模式 B 的 Select-String

1 回答 1

Related

Reference