一般来说,iRon在对该问题的评论中的建议值得关注(具体问题在本节后面的部分中解决):
为了保持较低的内存使用率,请在管道中使用对象流,而不是先将它们收集到内存中- 如果可行的话。
也就是说,而不是这样做:
# !! Collects ALL objects in memory, as an array.
$rows = Import-Csv in.csv
foreach ($row in $rows) { ... }
做这个:
# Process objects ONE BY ONE.
# As long as you stream to a *file* or some other output stream
# (as opposed to assigning to a *variable*), memory use should remain constant,
# except for temporarily held memory awaiting garbage collection.
Import-Csv in.csv | ForEach-Object { ... } # pipe to Export-Csv, for instance
但是,即使那样,您似乎也可以用非常大的文件耗尽内存- 请参阅这个问题- 可能与尚未被垃圾收集的不再需要的对象的内存积累有关;[GC]::Collect()
因此,定期调用ForEach-Object
脚本块可能会解决问题。
如果您确实需要Import-Csv
一次收集内存中输出的所有对象:
您观察到的过度内存使用来自于[pscustomobject]
实例(Import-Csv
的输出类型)的实现方式,如GitHub 问题 #7603(已添加重点)中所述:
内存压力很可能来自PSNoteProperty
[这是如何[pscustomobject]
实现属性]的成本。每个PSNoteProperty
都有 48 字节的开销,所以当你只为每个属性存储几个字节时,就会变得庞大。
同一问题提出了一种减少内存消耗的解决方法(如Wasif Hasan 的回答中所示):
注意:这种解决方法是以大大降低执行速度为代价的。
$csvFile = 'C:\top-1m.csv'
# Dynamically define a custom class derived from the *first* row
# read from the CSV file.
# Note: While this is a legitimate use of Invoke-Expression,
# it should generally be avoided.
"class CsvRow {
$((Import-Csv $csvFile | Select-Object -first 1).psobject.properties.Name -replace '^', '[string] $$' -join ";")
}" | Invoke-Expression
# Import all rows and convert them from [pscustomobject] instances
# to [CsvRow] instances to reduce memory consumption.
# Note: Casting the Import-Csv call directly to [CsvRow[]] would be noticeably
# faster, but increases *temporary* memory pressure substantially.
$alexaTopMillion = Import-Csv $csvFile | ForEach-Object { [CsvRow] $_ }
从长远来看,一个更快的更好解决方案是支持输出具有给定输出类型的解析行Import-Csv
,例如,通过-OutputType
参数,如GitHub 问题 #8862中所建议的那样。
如果您对此感兴趣,请在此处显示您对提案的支持。
内存使用基准:
以下代码将内存使用与正常Import-Csv
导入([pscustomobject]
s 数组)与解决方法(自定义类实例数组)进行比较。
测量结果并不准确,因为 PowerShell 的进程工作内存被简单地查询,这可以显示后台活动的影响,但它可以粗略地了解使用自定义类需要多少内存。
示例输出,显示自定义类解决方法仅需要大约五分之一的内存,下面使用示例 10 列 CSV 输入文件和大约 166,000 行 - 具体比率取决于输入行和列的数量:
MB Used Command
------- -------
384.50 # normal import…
80.48 # import via custom class…
基准代码:
# Create a sample CSV file with 10 columns about 16 MB in size.
$tempCsvFile = [IO.Path]::GetTempFileName()
('"Col1","Col2","Col3","Col4","Col5","Col6","Col7","Col8","Col9","Col10"' + "`n") | Set-Content -NoNewline $tempCsvFile
('"Col1Val","Col2Val","Col3Val","Col4Val","Col5Val","Col6Val","Col7Val","Col8Val","Col9Val","Col10Val"' + "`n") * 1.662e5 |
Add-Content $tempCsvFile
try {
{ # normal import
$all = Import-Csv $tempCsvFile
},
{ # import via custom class
"class CsvRow {
$((Import-Csv $tempCsvFile | Select-Object -first 1).psobject.properties.Name -replace '^', '[string] $$' -join ";")
}" | Invoke-Expression
$all = Import-Csv $tempCsvFile | ForEach-Object { [CsvRow] $_ }
} | ForEach-Object {
[gc]::Collect(); [gc]::WaitForPendingFinalizers() # garbage-collect first.
$before = (Get-Process -Id $PID).WorkingSet64
# Execute the command.
& $_
# Measure memory consumption and output the result.
[pscustomobject] @{
'MB Used' = ('{0,4:N2}' -f (((Get-Process -Id $PID).WorkingSet64 - $before) / 1mb)).PadLeft(7)
Command = $_
}
}
} finally {
Remove-Item $tempCsvFile
}