73

是否存在删除任何超过 x 天的文件的现有解决方案?

4

6 回答 6

94

亚马逊最近引入了对象过期

Amazon S3 宣布对象过期

Amazon S3 宣布了一项新功能,即对象过期,允许您在预定义的时间段后安排删除对象。使用对象过期计划定期删除对象,您无需识别要删除的对象并向 Amazon S3 提交删除请求。

您可以为存储桶中的一组对象定义对象过期规则。每个对象过期规则允许您指定前缀和过期期限(以天为单位)。前缀字段(例如 logs/)标识受到期规则约束的对象,并且到期期限指定从创建日期(即年龄)开始的天数,在该天数之后应该删除对象。一旦对象超过其到期日期,它们将排队等待删除。在对象到期日或之后,您无需为对象的存储付费。

于 2012-01-24T10:21:10.343 回答
5

这是有关如何执行此操作的一些信息...

http://docs.amazonwebservices.com/AmazonS3/latest/dev/ObjectExpiration.html

希望这可以帮助。

于 2012-01-24T14:21:11.617 回答
2

您可以使用 AWS S3 生命周期规则使文件过期并删除它们。您所要做的就是选择存储桶,单击“添加生命周期规则”按钮并对其进行配置,AWS 将为您处理它们。

您可以参考 Joe 的以下博客文章,了解分步说明。其实很简单:

https://www.joe0.com/2017/05/24/amazon-s3-how-to-delete-files-older-than-x-days/

希望能帮助到你!

于 2018-10-29T09:03:32.363 回答
2

以下是使用 CloudFormation 模板实现它的方法:

  JenkinsArtifactsBucket:
    Type: "AWS::S3::Bucket"
    Properties:
      BucketName: !Sub "jenkins-artifacts"
      LifecycleConfiguration:
        Rules:
          - Id: "remove-old-artifacts"
            ExpirationInDays: 3
            NoncurrentVersionExpirationInDays: 3
            Status: Enabled

正如@Ravi Bhatt 所解释的,这将创建一个生命周期规则

阅读更多相关信息: https ://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-s3-bucket-lifecycleconfig-rule.html

对象生命周期管理的工作原理: https ://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html

于 2020-03-31T10:35:52.840 回答
0

您可以使用以下 Powershell 脚本删除过期的对象x days

[CmdletBinding()]
Param(  
  [Parameter(Mandatory=$True)]
  [string]$BUCKET_NAME,             #Name of the Bucket

  [Parameter(Mandatory=$True)]
  [string]$OBJ_PATH,                #Key prefix of s3 object (directory path)

  [Parameter(Mandatory=$True)]
  [string]$EXPIRY_DAYS             #Number of days to expire
)

$CURRENT_DATE = Get-Date
$OBJECTS = Get-S3Object $BUCKET_NAME -KeyPrefix $OBJ_PATH
Foreach($OBJ in $OBJECTS){
    IF($OBJ.key -ne $OBJ_PATH){
        IF(($CURRENT_DATE - $OBJ.LastModified).Days -le $EXPIRY_DAYS){
            Write-Host "Deleting Object= " $OBJ.key
            Remove-S3Object -BucketName $BUCKET_NAME -Key $OBJ.Key -Force
        }
    }
}
于 2018-12-03T10:01:23.763 回答
0

这是一个删除 N 天前文件的 Python 脚本

from boto3 import client, Session
from botocore.exceptions import ClientError
from datetime import datetime, timezone
import argparse

if __name__ == '__main__':

    parser = argparse.ArgumentParser()
    
    parser.add_argument('--access_key_id', required=True)
    parser.add_argument('--secret_access_key', required=True)
    parser.add_argument('--delete_after_retention_days', required=False, default=15)
    parser.add_argument('--bucket', required=True)
    parser.add_argument('--prefix', required=False, default="")
    parser.add_argument('--endpoint', required=True)

    args = parser.parse_args()

    access_key_id = args.access_key_id
    secret_access_key = args.secret_access_key
    delete_after_retention_days = int(args.delete_after_retention_days)
    bucket = args.bucket
    prefix = args.prefix
    endpoint = args.endpoint

    # get current date
    today = datetime.now(timezone.utc)

    try:
        # create a connection to Wasabi
        s3_client = client(
            's3',
            endpoint_url=endpoint,
            access_key_id=access_key_id,
            secret_access_key=secret_access_key)
    except Exception as e:
        raise e

    try:
        # list all the buckets under the account
        list_buckets = s3_client.list_buckets()
    except ClientError:
        # invalid access keys
        raise Exception("Invalid Access or Secret key")

    # create a paginator for all objects.
    object_response_paginator = s3_client.get_paginator('list_object_versions')
    if len(prefix) > 0:
        operation_parameters = {'Bucket': bucket,
                                'Prefix': prefix}
    else:
        operation_parameters = {'Bucket': bucket}

    # instantiate temp variables.
    delete_list = []
    count_current = 0
    count_non_current = 0

    print("$ Paginating bucket " + bucket)
    for object_response_itr in object_response_paginator.paginate(**operation_parameters):
        for version in object_response_itr['Versions']:
            if version["IsLatest"] is True:
                count_current += 1
            elif version["IsLatest"] is False:
                count_non_current += 1
            if (today - version['LastModified']).days > delete_after_retention_days:
                delete_list.append({'Key': version['Key'], 'VersionId': version['VersionId']})

    # print objects count
    print("-" * 20)
    print("$ Before deleting objects")
    print("$ current objects: " + str(count_current))
    print("$ non-current objects: " + str(count_non_current))
    print("-" * 20)

    # delete objects 1000 at a time
    print("$ Deleting objects from bucket " + bucket)
    for i in range(0, len(delete_list), 1000):
        response = s3_client.delete_objects(
            Bucket=bucket,
            Delete={
                'Objects': delete_list[i:i + 1000],
                'Quiet': True
            }
        )
        print(response)

    # reset counts
    count_current = 0
    count_non_current = 0

    # paginate and recount
    print("$ Paginating bucket " + bucket)
    for object_response_itr in object_response_paginator.paginate(Bucket=bucket):
        if 'Versions' in object_response_itr:
            for version in object_response_itr['Versions']:
                if version["IsLatest"] is True:
                    count_current += 1
                elif version["IsLatest"] is False:
                    count_non_current += 1

    # print objects count
    print("-" * 20)
    print("$ After deleting objects")
    print("$ current objects: " + str(count_current))
    print("$ non-current objects: " + str(count_non_current))
    print("-" * 20)
    print("$ task complete")

这就是我的运行方式

python s3_cleanup.py --aws_access_key_id="access-key" --aws_secret_access_key="secret-key-here" --endpoint="https://s3.us-west-1.wasabisys.com" --bucket="ondemand-downloads" --prefix="" --delete_after_retention_days=5

如果您只想从特定文件夹中删除文件,请使用prefix参数

于 2022-01-22T07:35:05.907 回答