0

我正在编写一个 Powershell 脚本,将一个大文件拆分为多个文件,每个文件中有两对标签,这些小文件名必须遵循命名约定。

示例abcdef123.xml内容:

<parent>
    <child>
        <code1><code1>
        <text1><text1>
    </child>
    <child1>
        <code2><code2>
        <text2><text2>
    </child1>
    <child>
        <code3><code3>
        <text3><text3>
    </child>
    <child1>
        <code4><code4>
        <text4><text4>
    </child1>
    <child>
        <code5><code5>
        <text5><text5>
    </child>
    <child1>
        <code6><code6>
        <text6><text6>
    </child1>
    <child>
        <code7><code7>
        <text7><text7>
    </child>
    <child1>
        <code8><code8>
        <text8><text8>
    </child1>
</parent>

Powershell 脚本应将此大文件拆分为多个文件(文件中各有 2 对<child>& <child1>),并具有以下条件,并接受用户输入的文件名约定(所有文件名中的毫秒日期可以保持相同,但变量j应该更改):-

标准:-

  1. 为每个文件添加 header<parent>和 tail 。</parent>
  2. 文件名的格式应为UserinputstringMMDDYYYYHHMMSSMIL_n increment.xml(其中MIL是毫秒,n increment类似于001, 002, 003, ...)
  3. 没有两个文件应该具有相同的文件名。

示例文件拆分:-

文件 1;stack_10132020134434789_001.xml内容:

<parent>
    <child>
        <code1><code1>
        <text1><text1>
    </child>
    <child1>
        <code2><code2>
        <text2><text2>
    </child1>
    <child>
        <code3><code3>
        <text3><text3>
    </child>
    <child1>
        <code4><code4>
        <text4><text4>
    </child1>
</parent>

文件 2;stack_10132020134434791_002.xml内容:

<parent>
    <child>
        <code5><code5>
        <text5><text5>
    </child>
    <child1>
        <code6><code6>
        <text6><text6>
    </child1>
    <child>
        <code7><code7>
        <text7><text7>
    </child>
    <child1>
        <code8><code8>
        <text8><text8>
    </child1>
</parent>

我正在尝试的脚本:

csplit -ksf part. src.xml

n=000

#E.g. Enter beginning of file name :
#User entered-> stack
#read userinput

j=n+1

$date= date +%m%d%Y%H%M%S%3N

filename=$userinput$date_$j.xml
4

1 回答 1

2

以下假设可以使用[xml]类型 ( System.Xml.XmlDocument) 完整读取您的 XML 文件并解析为 XML DOM。

# Simulate parsing the input XML file.
# In your real code, you'd so something like:
#  [xml] $doc = (Get-Content -Raw some.xml)
# The XML here is a condensed and corrected version of the sample XML in your question.
[xml] $doc = @'
<parent><child><code1></code1><text1></text1></child><child1><code2></code2><text2></text2></child1><child><code3></code3><text3></text3></child><child1><code4></code4><text4></text4></child1><child><code5></code5><text5></text5></child><child1><code6></code6><text6></text6></child1><child><code7></code7><text7></text7></child><child1><code8></code8><text8></text8></child1></parent>
'@

# Create the template for the output file names, to be instantiated
# (again) later with the -f operator.
$userInputString = 'stack'  # use Read-Host to prompt the user for this string.
$fileNameTemplate = '{0}_{1}_{{0:000}}.xml' -f $userInputString, (Get-Date -Format 'MMddyyyyhhmmssfff')

# Create an auxiliary document for creating the output files.
$auxDoc = [xml] '<parent/>'

$batchSize = 4  # Count of child elements per output file.
$fileNum = 1; $offset = 0 # Initialize loop variables.
$children = $doc.parent.ChildNodes # Get all child elements of <parent>
# Loop in batches of $batchSize until all children have been processed.
while ($offset -lt $children.Count) {

  # Make the next $batchSize child elements the content of the aux. document...
  $auxDoc.DocumentElement.InnerXml = -join $(
   foreach ($c in $children[$offset..($offset+$batchSize-1)]) { $c.OuterXml }
  )

  # ... determine the output file name via the current sequence number...
  $fileName = $fileNameTemplate -f $fileNum

  # ...and save.
  # Note: Always use a *full* (absolute) path when calling .NET methods, because
  #       .NET's working dir. differs from PowerShell's.
  $auxDoc.Save("$PWD/$fileName")

  # Prepare for next iteration.
  $offset += $batchSize
  ++$fileNum

}
于 2020-10-28T15:35:42.970 回答