0

有点奇怪的问题。我有一个需要处理的大型 JSON 文件。基于另一个问题,我需要流式传输文件,因为否则它会因为内存而给我带来问题:JSON Powershell memory issue

我所拥有的是:

get-content -Path largefile.json | ForEach-Object {
$row = $_ = $_.TrimStart('[').TrimEnd(']')
if ($_) { $_ | Out-String | ConvertFrom-Json }
New-Item -Path $($Row.Id).txt
Set-Content -Path $($Row.Id).txt -Value ($row.Body)
}

我可以轻松地使用 $row 来发布 Largefile.json 中最后处理的行。我想在当前处理的行中创建一个名称为 Id 的文件,并将正文列添加到文件中。但是,当我想使用 $row.Id 显示特定列时,不幸的是这显示为空。

Largefile.json 的结构如下:

[{"Id":"1","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"data1"}
{"Id":"2","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"data2"}
{"Id":"3","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"data3"}
{"Id":"4","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"data4"}
{"Id":"5","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"data5"}
]

最终结果应该是我有 5 个文件:

  • 1.txt - 文件内的值应该是:data1

  • 2.txt - 文件内的值应该是:data2

  • 3.txt - 文件内的值应该是:data3

  • 4.txt - 文件内的值应该是:data4

  • 5.txt - 文件内的值应该是:data5

我使用 Powershell 7.1.3

有什么方法可以像普通的 ForEach 一样使用 $row.Id 和 $row.ParentId 吗?

谢谢你的帮助。

4

4 回答 4

0

在我看来,这就是你要找的东西:

Get-Content largefile.json | ForEach-Object {
    $row = $_.TrimStart('[').TrimEnd(']') | ConvertFrom-Json
    if ($null -ne $row) {
        Set-Content -Path ($row.Id) -Value ($row.Body)
    }
}
于 2021-05-04T13:32:40.803 回答
0

这个问题有很多错误。假设 json 中缺少逗号,如果我理解这个问题,我会这样做。这应该适用于问题的新更新。我这里还有一个更不寻常的解决方案,涉及使用 jq 流式传输 json: Iterate 尽管 powershell 中的巨大 JSON 稍后可能会添加 Json 流式支持: ConvertFrom-JSON 高内存消耗 #7698

[{"Id":"ID","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"*******"},
 {"Id":"ID","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"*******"},
 {"Id":"ID","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"*******"},
 {"Id":"ID","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"*******"},
{"Id":"ID","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"*******"}
]
get-content -Path largefile.json | ForEach-Object {
  $_ = $_.TrimStart('[').TrimEnd(']').TrimEnd(',')
  if ($_) {
    $row = $_ | ConvertFrom-Json
    Set-Content -Path ($Row.Id + '.txt') -Value $row.Body
  }
}
get-content ID.txt

*******
于 2021-05-04T14:04:22.840 回答
0

正如其他人已经解释的那样,您的 json 示例无效。

但是,由于这是一个要处理的巨大文件,因此您可以使用switch它。

switch -Regex -File D:\Test\largefile.json {
    '"Id":"(\d+)".*"Body":"(\w+)"' { 
        Set-Content -Path ('D:\Test\{0}.txt' -f $matches[1]) -Value $matches[2]
    }
}

使用您的示例的结果将是 5 个名为1.txt..的文件5.txt,每个文件都有一行data1..data5

于 2021-05-05T15:26:35.670 回答
0

我仍然不确定您期望的结果。
但我认为你想这样做:

@'
[{"Id":"1","ParentId":"parent1","Name":"1.txt","OwnerId":"owner","CreatedDate":"date","Body":"Data1"}
{"Id":"2","ParentId":"parent2","Name":"2.txt","OwnerId":"owner","CreatedDate":"date","Body":"Data2"}
{"Id":"3","ParentId":"parent3","Name":"3.txt","OwnerId":"owner","CreatedDate":"date","Body":"Data3"}
{"Id":"4","ParentId":"parent4","Name":"4.txt","OwnerId":"owner","CreatedDate":"date","Body":"Data4"}
{"Id":"5","ParentId":"parent5","Name":"5.txt","OwnerId":"owner","CreatedDate":"date","Body":"Data5"}
]
'@ | Set-Content .\largefile.json

Get-Content .\largefile.json | ForEach-Object {
    $_ = $_.TrimStart('[').TrimEnd(']')
    If ($_) { 
        $Row = ConvertFrom-Json $_
        Set-Content -Path ".\$($Row.Name)" -Value $Row.Body
    }
}
于 2021-05-04T13:05:10.030 回答