3

Is there any easy way to use PowerShell to only get a list of "folders" from an S3 bucket, without listing every single object and just scripting a compiled list of distinct paths? There are hundreds of thousands of individual objects in the bucket I'm working in, and that would take a very long time.

It's possible this is a really stupid question and I'm sorry if that's the case, but I couldn't find anything on Google or SO to answer this. I've tried adding wildcards to -KeyPrefix and -Key params of Get-S3Object to no avail. That's the only cmdlet that seems like it might be capable of doing what I'm after.

Pointless backstory: I just want to make sure I'm transferring files to the correct, existing folders. I'm a contracted third party, so I don't have console login access and I'm not the person who maintains the AWS account.

I know this is possible using Java and C# and others, but I'm doing everything else involved with this fairly simple project in PS and was hoping to be able to stick with it.

Thanks in advance.

4

4 回答 4

3

您可以使用AWS Tools For PowerShell列出存储桶中的对象(通过Get-S3Object)并从响应对象中提取公共前缀。

下面是一个递归检索子目录的小库:

function Get-Subdirectories
{
  param
  (
    [string] $BucketName,
    [string] $KeyPrefix,
    [bool] $Recurse
  )

  @(get-s3object -BucketName $BucketName -KeyPrefix $KeyPrefix -Delimiter '/') | Out-Null

  if($AWSHistory.LastCommand.Responses.Last.CommonPrefixes.Count -eq 0)
  {
    return
  }

  $AWSHistory.LastCommand.Responses.Last.CommonPrefixes

  if($Recurse)
  {
    $AWSHistory.LastCommand.Responses.Last.CommonPrefixes | % { Get-Subdirectories -BucketName $BucketName -KeyPrefix $_ -Recurse $Recurse }
  }
}

function Get-S3Directories
{
  param
  (
    [string] $BucketName,
    [bool] $Recurse = $false
  )

  Get-Subdirectories -BucketName $BucketName -KeyPrefix '/' -Recurse $Recurse
}

此递归函数依赖于在每次迭代时更新 KeyPrefix 以检查传递给它的每个 KeyPrefix 中的子目录。通过将分隔符设置为'/',在第一次出现分隔符之前匹配 KeyPrefix 字符串的键将滚动到 $AWSHistory 的最后响应中的 CommonPrefixes 集合中。

要仅检索 S3 存储桶中的顶级目录:

PS C:/> Get-S3Directories -BucketName 'myBucket'

要检索 S3 存储桶中的所有目录:

PS C:/> Get-S3Directories -BucketName 'myBucket' -Recurse $true

这将返回一个字符串集合,其中每个字符串都是一个公共前缀。

示例输出:

myprefix/
myprefix/txt/
myprefix/img/
myotherprefix/
...
于 2016-04-22T22:04:58.577 回答
1
$objects = Get-S3Object -BucketName $bucketname -ProfileName $profilename -Region $region
$paths=@()
foreach($object in $objects) 
{
    $path = split-path $object.Key -Parent 
    $paths += $path
}
$paths = $paths | select -Unique
write-host "`nNumber of folders "$paths.count""
Write-host "$([string]::join("`n",$paths)) "
于 2017-06-03T02:37:09.563 回答
0

此版本的 Powershell 在单个 S3 存储桶中迭代超过 1000 个键(aws 限制 API get-S3object 仅 1000 个键,因此我们需要一个 while 循环来获取超过 1000 个键,即文件夹)在生成输出到 csv 后,请记住对重复项进行排序Excel 删除重复项(PS,任何人都可以协助对重复项进行排序,因为我认为我的脚本不能很好地处理重复项)

#Main-Code 
$keysPerPage = 1000 #Set max key of AWS limit of 1000
$bucketN = 'testBucket' #Bucketname
$nextMarker = $null 
$output =@()
$Start = "S3 Bucket Name : $bucketN"
$End = "- End of Folder List -"

Do
{
  #Iterate 1000 records per do-while loop, this is to overcome the limitation of only 1000 keys retrieval per get-s3object calls by AWS 
  $batch = get-s3object -BucketName $bucketN -Maxkey $keysPerPage -Marker $nextMarker 

  $batch2 = $batch.key | % {$_.Split('/')[0]} | Sort -Unique 
  $output += $batch2 
  $batch2

  $nextMarker= $AWSHistory.LastServiceResponse.NextMarker
} while ($nextMarker)

   #Output to specific folder in a directory
   $Start | Out-file C:\Output-Result.csv  -Append
   $output | Out-file C:\Output-Result.csv  -Append
   $End | Out-file C:\Output-Result.csv -Append
于 2018-03-20T07:34:27.403 回答
0

接受的答案是正确的,但有一个缺陷。如果您有一个包含许多“文件夹”(超过 1000 个)的大存储桶,您将只能使用以下方法获得最后 1000 个前缀:

$AWSHistory.LastCommand.Responses.Last.CommonPrefixes

AWS 以 1000 为增量对响应进行批处理。如果你看

$AWSHistory.LastCommand.Responses.History 

您将看到多个条目。不幸的是,默认情况下只有 5 个。您可以使用 Set-AWSHistoryConfiguration 函数更改该行为。

要增加历史响应的数量,请使用 -MaxServiceCallHistory 参数。

Set-AWSHistoryConfiguration -MaxServiceCallHistory 20

这将存储下一个(以及所有后续)命令的最后 20 个服务调用。

使用上述配置,您可以从一个文件夹中检索多达 20000 个子文件夹。

要检索所有文件夹,请执行以下操作:

$subFolders = ($AwsHistory.LastCommand.Responses.History).CommonPrefixes

注意:增加配置参数将使用更多内存。

于 2021-12-03T20:12:22.167 回答