我在 Amazon S3 上更改了一堆存储桶的生命周期,因此它们的存储类设置为 Glacier。我使用在线 AWS 控制台完成了这项工作。我现在再次需要这些文件。
我知道如何将它们恢复到每个文件的 S3。但是我的存储桶有数千个文件。我想看看有没有办法把整个桶恢复到 S3,就像有办法把整个桶送到 Glacier 一样?
我猜有一种方法可以编写解决方案。但我想看看是否有办法在控制台中做到这一点。还是与其他程序?或者我可能会缺少其他东西?
我在 Amazon S3 上更改了一堆存储桶的生命周期,因此它们的存储类设置为 Glacier。我使用在线 AWS 控制台完成了这项工作。我现在再次需要这些文件。
我知道如何将它们恢复到每个文件的 S3。但是我的存储桶有数千个文件。我想看看有没有办法把整个桶恢复到 S3,就像有办法把整个桶送到 Glacier 一样?
我猜有一种方法可以编写解决方案。但我想看看是否有办法在控制台中做到这一点。还是与其他程序?或者我可能会缺少其他东西?
如果你s3cmd
使用它,你可以很容易地使用它来递归地恢复:
s3cmd restore --recursive s3://mybucketname/
我也用它来恢复文件夹:
s3cmd restore --recursive s3://mybucketname/folder/
如果您使用的是AWS CLI 工具(这很好,您应该这样做),您可以这样做:
aws s3 ls s3://<BUCKET_NAME> --recursive | awk '{print $4}' | xargs -L 1 aws s3api restore-object --restore-request '{"Days":<DAYS>,"GlacierJobParameters":{"Tier":"<TIER>"}}' --bucket <BUCKET_NAME> --key
替换<BUCKET_NAME>
为您想要的存储桶名称并提供恢复参数<DAYS>
和<TIER>
.
<DAYS>
是您要恢复对象的天数,<TIER>
控制恢复过程的速度,分为三个级别:批量、标准或加急:
上述答案对我来说效果不佳,因为我的桶与 Glacier 上的物体混合在一起,而有些则不是。对我来说最简单的事情是创建存储桶中所有GLACIER 对象的列表,然后尝试单独恢复每个对象,忽略任何错误(例如已经在进行中,而不是对象等)。
获取存储桶中所有 GLACIER 文件(密钥)的列表
aws s3api list-objects-v2 --bucket <bucketName> --query "Contents[?StorageClass=='GLACIER']" --output text | awk '{print $2}' > glacier-restore.txt
创建一个 shell 脚本并运行它,替换您的“bucketName”。
#!/bin/sh
for x in `cat glacier-restore.txt`
do
echo "Begin restoring $x"
aws s3api restore-object --restore-request Days=7 --bucket <bucketName> --key "$x"
echo "Done restoring $x"
done
归功于 Josh 在http://capnjosh.com/blog/a-client-error-invalidobjectstate-occurred-when-calling-the-copyobject-operation-operation-is-not-valid-for-the-source-objects -storage-class/,我在尝试了上述一些解决方案后发现的资源。
没有用于此的内置工具。S3 中的“文件夹”是一种方便人类的错觉,基于对象键(路径/文件名)中的正斜杠,并且每个迁移到冰川的对象都必须单独恢复,尽管......
当然,您可以编写一个脚本来遍历层次结构,并使用您选择的编程语言中的 SDK 或 REST API 发送这些恢复请求。
在继续之前,请确保您了解从冰川恢复到 S3 的工作原理。它始终只是临时恢复,您可以选择每个对象在 S3 中保留的天数,然后再恢复为仅存储在冰川中。
此外,您要确定自己了解在短时间内恢复过多冰川数据的罚款,否则您可能会承担一些意外费用。根据紧急程度,您可能希望将恢复操作分散到数天或数周内。
我最近需要恢复整个存储桶及其所有文件和文件夹。您将需要使用您的凭据配置的 s3cmd 和 aws cli 工具来运行它。
我发现这非常强大,可以处理存储桶中可能已经有恢复请求的特定对象的错误。
#!/bin/sh
# This will give you a nice list of all objects in the bucket with the bucket name stripped out
s3cmd ls -r s3://<your-bucket-name> | awk '{print $4}' | sed 's#s3://<your-bucket-name>/##' > glacier-restore.txt
for x in `cat glacier-restore.txt`
do
echo "restoring $x"
aws s3api restore-object --restore-request Days=7 --bucket <your-bucket-name> --profile <your-aws-credentials-profile> --key "$x"
done
这是我的aws cli
界面版本以及如何从冰川中恢复数据。当要恢复的文件的键包含空格时,我修改了上面的一些示例。
# Parameters
BUCKET="my-bucket" # the bucket you want to restore, no s3:// no slashes
BPATH="path/in/bucket/" # the objects prefix you wish to restore (mind the `/`)
DAYS=1 # For how many days you wish to restore the data.
# Restore the objects
aws s3 ls s3://${BUCKET}/${BPATH} --recursive | \
awk '{out=""; for(i=4;i<=NF;i++){out=out" "$i}; print out}'| \
xargs -I {} aws s3api restore-object --restore-request Days=${DAYS} \
--bucket ${BUCKET} --key "{}"
看起来 S3 浏览器可以在文件夹级别“从 Glacier 恢复”,但不能在存储桶级别。唯一的事情是你必须购买专业版。所以不是最好的解决方案。
达斯汀使用 AWS CLI 的答案的变体,但使用递归和管道到 sh 来跳过错误(例如,如果某些对象已经请求恢复......)
BUCKET=my-bucket
BPATH=/path/in/bucket
DAYS=1
aws s3 ls s3://$BUCKET$BPATH --recursive | awk '{print $4}' | xargs -L 1 \
echo aws s3api restore-object --restore-request Days=$DAYS \
--bucket $BUCKET --key | sh
xargs echo 位生成“aws s3api restore-object”命令列表,并通过管道将其传输到 sh,您可以继续出错。
注意:Ubuntu 14.04 aws-cli 包是旧的。为了使用--recursive
,您需要通过 github 安装。
后记:冰川修复很快就会变得出人意料地昂贵。根据您的用例,您可能会发现不频繁访问层更合适。AWS 对不同的层有很好的解释。
这个命令对我有用:
aws s3api list-objects-v2 \
--bucket BUCKET_NAME \
--query "Contents[?StorageClass=='GLACIER']" \
--output text | \
awk -F $'\t' '{print $2}' | \
tr '\n' '\0' | \
xargs -L 1 -0 \
aws s3api restore-object \
--restore-request Days=7 \
--bucket BUCKET_NAME \
--key
专家提示
RestoreAlreadyInProgress
状态,然后才能重新运行它。状态转换可能需要几个小时。如果您需要等待,您将看到此错误消息:An error occurred (RestoreAlreadyInProgress) when calling the RestoreObject operation
我今天经历了这个工厂,并根据上面的答案提出了以下内容,并且还尝试了 s3cmd。s3cmd 不适用于混合存储桶(Glacier 和 Standard)。这将分两步完成您需要的操作 - 首先创建一个 glacier 文件列表,然后 ping s3 cli 请求(即使它们已经发生)。它还将跟踪已请求的内容,以便您可以根据需要重新启动脚本。注意下面引用的 cut 命令中的 TAB (\t):
#/bin/sh
bucket="$1"
glacier_file_list="glacier-restore-me-please.txt"
glacier_file_done="glacier-requested-restore-already.txt"
if [ "X${bucket}" = "X" ]
then
echo "Please supply bucket name as first argument"
exit 1
fi
aws s3api list-objects-v2 --bucket ${bucket} --query "Contents[?StorageClass=='GLACIER']" --output text |cut -d '\t' -f 2 > ${glacier_file_list}
if $? -ne 0
then
echo "Failed to fetch list of objects from bucket ${bucket}"
exit 1
fi
echo "Got list of glacier files from bucket ${bucket}"
while read x
do
echo "Begin restoring $x"
aws s3api restore-object --restore-request Days=7 --bucket ${bucket} --key "$x"
if [ $? -ne 0 ]
then
echo "Failed to restore \"$x\""
else
echo "Done requested restore of \"$x\""
fi
# Log those done
#
echo "$x" >> ${glacier_file_done}
done < ${glacier_file_list}
我在 python 中编写了一个程序来递归恢复文件夹。上面的s3cmd
命令对我不起作用,命令也不起作用awk
。
您可以像这样运行它python3 /home/ec2-user/recursive_restore.py -- restore
并监视还原状态的使用python3 /home/ec2-user/recursive_restore.py -- status
import argparse
import base64
import json
import os
import sys
from datetime import datetime
from pathlib import Path
import boto3
import pymysql.cursors
import yaml
from botocore.exceptions import ClientError
__author__ = "kyle.bridenstine"
def reportStatuses(
operation,
type,
successOperation,
folders,
restoreFinished,
restoreInProgress,
restoreNotRequestedYet,
restoreStatusUnknown,
skippedFolders,
):
"""
reportStatuses gives a generic, aggregated report for all operations (Restore, Status, Download)
"""
report = 'Status Report For "{}" Operation. Of the {} total {}, {} are finished being {}, {} have a restore in progress, {} have not been requested to be restored yet, {} reported an unknown restore status, and {} were asked to be skipped.'.format(
operation,
str(len(folders)),
type,
str(len(restoreFinished)),
successOperation,
str(len(restoreInProgress)),
str(len(restoreNotRequestedYet)),
str(len(restoreStatusUnknown)),
str(len(skippedFolders)),
)
if (len(folders) - len(skippedFolders)) == len(restoreFinished):
print(report)
print("Success: All {} operations are complete".format(operation))
else:
if (len(folders) - len(skippedFolders)) == len(restoreNotRequestedYet):
print(report)
print("Attention: No {} operations have been requested".format(operation))
else:
print(report)
print("Attention: Not all {} operations are complete yet".format(operation))
def status(foldersToRestore, restoreTTL):
s3 = boto3.resource("s3")
folders = []
skippedFolders = []
# Read the list of folders to process
with open(foldersToRestore, "r") as f:
for rawS3Path in f.read().splitlines():
folders.append(rawS3Path)
s3Bucket = "put-your-bucket-name-here"
maxKeys = 1000
# Remove the S3 Bucket Prefix to get just the S3 Path i.e., the S3 Objects prefix and key name
s3Path = removeS3BucketPrefixFromPath(rawS3Path, s3Bucket)
# Construct an S3 Paginator that returns pages of S3 Object Keys with the defined prefix
client = boto3.client("s3")
paginator = client.get_paginator("list_objects")
operation_parameters = {"Bucket": s3Bucket, "Prefix": s3Path, "MaxKeys": maxKeys}
page_iterator = paginator.paginate(**operation_parameters)
pageCount = 0
totalS3ObjectKeys = []
totalS3ObjKeysRestoreFinished = []
totalS3ObjKeysRestoreInProgress = []
totalS3ObjKeysRestoreNotRequestedYet = []
totalS3ObjKeysRestoreStatusUnknown = []
# Iterate through the pages of S3 Object Keys
for page in page_iterator:
for s3Content in page["Contents"]:
s3ObjectKey = s3Content["Key"]
# Folders show up as Keys but they cannot be restored or downloaded so we just ignore them
if s3ObjectKey.endswith("/"):
continue
totalS3ObjectKeys.append(s3ObjectKey)
s3Object = s3.Object(s3Bucket, s3ObjectKey)
if s3Object.restore is None:
totalS3ObjKeysRestoreNotRequestedYet.append(s3ObjectKey)
elif "true" in s3Object.restore:
totalS3ObjKeysRestoreInProgress.append(s3ObjectKey)
elif "false" in s3Object.restore:
totalS3ObjKeysRestoreFinished.append(s3ObjectKey)
else:
totalS3ObjKeysRestoreStatusUnknown.append(s3ObjectKey)
pageCount = pageCount + 1
# Report the total statuses for the folders
reportStatuses(
"restore folder " + rawS3Path,
"files",
"restored",
totalS3ObjectKeys,
totalS3ObjKeysRestoreFinished,
totalS3ObjKeysRestoreInProgress,
totalS3ObjKeysRestoreNotRequestedYet,
totalS3ObjKeysRestoreStatusUnknown,
[],
)
def removeS3BucketPrefixFromPath(path, bucket):
"""
removeS3BucketPrefixFromPath removes "s3a://<bucket name>" or "s3://<bucket name>" from the Path
"""
s3BucketPrefix1 = "s3a://" + bucket + "/"
s3BucketPrefix2 = "s3://" + bucket + "/"
if path.startswith(s3BucketPrefix1):
# remove one instance of prefix
return path.replace(s3BucketPrefix1, "", 1)
elif path.startswith(s3BucketPrefix2):
# remove one instance of prefix
return path.replace(s3BucketPrefix2, "", 1)
else:
return path
def restore(foldersToRestore, restoreTTL):
"""
restore initiates a restore request on one or more folders
"""
print("Restore Operation")
s3 = boto3.resource("s3")
bucket = s3.Bucket("put-your-bucket-name-here")
folders = []
skippedFolders = []
# Read the list of folders to process
with open(foldersToRestore, "r") as f:
for rawS3Path in f.read().splitlines():
folders.append(rawS3Path)
# Skip folders that are commented out of the file
if "#" in rawS3Path:
print("Skipping this folder {} since it's commented out with #".format(rawS3Path))
folders.append(rawS3Path)
continue
else:
print("Restoring folder {}".format(rawS3Path))
s3Bucket = "put-your-bucket-name-here"
maxKeys = 1000
# Remove the S3 Bucket Prefix to get just the S3 Path i.e., the S3 Objects prefix and key name
s3Path = removeS3BucketPrefixFromPath(rawS3Path, s3Bucket)
print("s3Bucket={}, s3Path={}, maxKeys={}".format(s3Bucket, s3Path, maxKeys))
# Construct an S3 Paginator that returns pages of S3 Object Keys with the defined prefix
client = boto3.client("s3")
paginator = client.get_paginator("list_objects")
operation_parameters = {"Bucket": s3Bucket, "Prefix": s3Path, "MaxKeys": maxKeys}
page_iterator = paginator.paginate(**operation_parameters)
pageCount = 0
totalS3ObjectKeys = []
totalS3ObjKeysRestoreFinished = []
totalS3ObjKeysRestoreInProgress = []
totalS3ObjKeysRestoreNotRequestedYet = []
totalS3ObjKeysRestoreStatusUnknown = []
# Iterate through the pages of S3 Object Keys
for page in page_iterator:
print("Processing S3 Key Page {}".format(str(pageCount)))
s3ObjectKeys = []
s3ObjKeysRestoreFinished = []
s3ObjKeysRestoreInProgress = []
s3ObjKeysRestoreNotRequestedYet = []
s3ObjKeysRestoreStatusUnknown = []
for s3Content in page["Contents"]:
print("Processing S3 Object Key {}".format(s3Content["Key"]))
s3ObjectKey = s3Content["Key"]
# Folders show up as Keys but they cannot be restored or downloaded so we just ignore them
if s3ObjectKey.endswith("/"):
print("Skipping this S3 Object Key because it's a folder {}".format(s3ObjectKey))
continue
s3ObjectKeys.append(s3ObjectKey)
totalS3ObjectKeys.append(s3ObjectKey)
s3Object = s3.Object(s3Bucket, s3ObjectKey)
print("{} - {} - {}".format(s3Object.key, s3Object.storage_class, s3Object.restore))
# Ensure this folder was not already processed for a restore
if s3Object.restore is None:
restore_response = bucket.meta.client.restore_object(
Bucket=s3Object.bucket_name, Key=s3Object.key, RestoreRequest={"Days": restoreTTL}
)
print("Restore Response: {}".format(str(restore_response)))
# Refresh object and check that the restore request was successfully processed
s3Object = s3.Object(s3Bucket, s3ObjectKey)
print("{} - {} - {}".format(s3Object.key, s3Object.storage_class, s3Object.restore))
if s3Object.restore is None:
s3ObjKeysRestoreNotRequestedYet.append(s3ObjectKey)
totalS3ObjKeysRestoreNotRequestedYet.append(s3ObjectKey)
print("%s restore request failed" % s3Object.key)
# Instead of failing the entire job continue restoring the rest of the log tree(s)
# raise Exception("%s restore request failed" % s3Object.key)
elif "true" in s3Object.restore:
print(
"The request to restore this file has been successfully received and is being processed: {}".format(
s3Object.key
)
)
s3ObjKeysRestoreInProgress.append(s3ObjectKey)
totalS3ObjKeysRestoreInProgress.append(s3ObjectKey)
elif "false" in s3Object.restore:
print("This file has successfully been restored: {}".format(s3Object.key))
s3ObjKeysRestoreFinished.append(s3ObjectKey)
totalS3ObjKeysRestoreFinished.append(s3ObjectKey)
else:
print(
"Unknown restore status ({}) for file: {}".format(s3Object.restore, s3Object.key)
)
s3ObjKeysRestoreStatusUnknown.append(s3ObjectKey)
totalS3ObjKeysRestoreStatusUnknown.append(s3ObjectKey)
elif "true" in s3Object.restore:
print("Restore request already received for {}".format(s3Object.key))
s3ObjKeysRestoreInProgress.append(s3ObjectKey)
totalS3ObjKeysRestoreInProgress.append(s3ObjectKey)
elif "false" in s3Object.restore:
print("This file has successfully been restored: {}".format(s3Object.key))
s3ObjKeysRestoreFinished.append(s3ObjectKey)
totalS3ObjKeysRestoreFinished.append(s3ObjectKey)
else:
print(
"Unknown restore status ({}) for file: {}".format(s3Object.restore, s3Object.key)
)
s3ObjKeysRestoreStatusUnknown.append(s3ObjectKey)
totalS3ObjKeysRestoreStatusUnknown.append(s3ObjectKey)
# Report the statuses per S3 Key Page
reportStatuses(
"folder-" + rawS3Path + "-page-" + str(pageCount),
"files in this page",
"restored",
s3ObjectKeys,
s3ObjKeysRestoreFinished,
s3ObjKeysRestoreInProgress,
s3ObjKeysRestoreNotRequestedYet,
s3ObjKeysRestoreStatusUnknown,
[],
)
pageCount = pageCount + 1
if pageCount > 1:
# Report the total statuses for the files
reportStatuses(
"restore-folder-" + rawS3Path,
"files",
"restored",
totalS3ObjectKeys,
totalS3ObjKeysRestoreFinished,
totalS3ObjKeysRestoreInProgress,
totalS3ObjKeysRestoreNotRequestedYet,
totalS3ObjKeysRestoreStatusUnknown,
[],
)
def displayError(operation, exc):
"""
displayError displays a generic error message for all failed operation's returned exceptions
"""
print(
'Error! Restore{} failed. Please ensure that you ran the following command "./tools/infra auth refresh" before executing this program. Error: {}'.format(
operation, exc
)
)
def main(operation, foldersToRestore, restoreTTL):
"""
main The starting point of the code that directs the operation to it's appropriate workflow
"""
print(
"{} Starting log_migration_restore.py with operation={} foldersToRestore={} restoreTTL={} Day(s)".format(
str(datetime.now().strftime("%d/%m/%Y %H:%M:%S")), operation, foldersToRestore, str(restoreTTL)
)
)
if operation == "restore":
try:
restore(foldersToRestore, restoreTTL)
except Exception as exc:
displayError("", exc)
elif operation == "status":
try:
status(foldersToRestore, restoreTTL)
except Exception as exc:
displayError("-Status-Check", exc)
else:
raise Exception("%s is an invalid operation. Please choose either 'restore' or 'status'" % operation)
def check_operation(operation):
"""
check_operation validates the runtime input arguments
"""
if operation is None or (
str(operation) != "restore" and str(operation) != "status" and str(operation) != "download"
):
raise argparse.ArgumentTypeError(
"%s is an invalid operation. Please choose either 'restore' or 'status' or 'download'" % operation
)
return str(operation)
# To run use sudo python3 /home/ec2-user/recursive_restore.py -- restore
# -l /home/ec2-user/folders_to_restore.csv
if __name__ == "__main__":
# Form the argument parser.
parser = argparse.ArgumentParser(
description="Restore s3 folders from archival using 'restore' or check on the restore status using 'status'"
)
parser.add_argument(
"operation",
type=check_operation,
help="Please choose either 'restore' to restore the list of s3 folders or 'status' to see the status of a restore on the list of s3 folders",
)
parser.add_argument(
"-l",
"--foldersToRestore",
type=str,
default="/home/ec2-user/folders_to_restore.csv",
required=False,
help="The location of the file containing the list of folders to restore. Put one folder on each line.",
)
parser.add_argument(
"-t",
"--restoreTTL",
type=int,
default=30,
required=False,
help="The number of days you want the filess to remain restored/unarchived. After this period the logs will automatically be rearchived.",
)
args = parser.parse_args()
sys.exit(main(args.operation, args.foldersToRestore, args.restoreTTL))