我有一个包含 700 万个 XML 文件的设置,大小从几 KB 到几 MB 不等。总而言之,它大约有 180GB 的 XML 文件。我需要执行的工作是分析每个 XML 文件并确定文件是否包含 string <ref>
,以及是否不将其从当前包含的 Chunk 文件夹中移出到 Referenceless 文件夹中。
我创建的脚本运行良好,但就我的目的而言它非常慢。它计划在大约 24 天内完成对所有 700 万个文件的分析,以每秒大约 3 个文件的速度进行。我可以在我的脚本中更改什么来获得更多性能吗?
此外,更复杂的是,我在我的服务器上没有正确的权限来运行 .PS1 文件,因此脚本需要能够在一个命令中从 PowerShell 运行。如果我有授权,我会设置权限。
# This script will iterate through the Chunk folders, removing pages that contain no
# references and putting them into the Referenceless folder.
# Change this variable to start the program on a different chunk. This is the first
# command to be run in Windows PowerShell.
$chunknumber = 1
#This while loop is the second command to be run in Windows PowerShell. It will stop after completing Chunk 113.
while($chunknumber -le 113){
#Jumps the terminal to the correct folder.
cd C:\Wiki_Pages
#Creates an index for the chunk being worked on.
$items = Get-ChildItem -Path "Chunk_$chunknumber"
echo "Chunk $chunknumber Indexed"
#Jumps to chunk folder.
cd C:\Wiki_Pages\Chunk_$chunknumber
#Loops through the index. Each entry is one of the pages.
foreach ($page in $items){
#Creates a variable holding the page's content.
$content = Get-Content $page
#If the page has a reference, then it's echoed.
if($content | Select-String "<ref>" -quiet){echo "Referenced!"}
#if the page doesn't have a reference, it's copied to Referenceless then deleted.
else{
Copy-Item $page C:\Wiki_Pages\Referenceless -force
Remove-Item $page -force
echo "Moved to Referenceless!"
}
}
#The chunk number is increased by one and the cycle continues.
$chunknumber = $chunknumber + 1
}
我对 PowerShell 知之甚少,昨天是我第一次打开这个程序。