0

这是我第一次使用亚马逊的任何东西。我正在尝试使用 PHP SDK V3 将多个文件上传到 Amazon Glacier。然后,亚马逊需要将这些文件合并为一个。

这些文件存储在 cPanel 的主目录中,并且必须通过 cron 作业上传到 Amazon Glacier。

我知道我必须使用上传多部分方法,但我不确定它需要哪些其他功能才能使其工作。我也不确定我计算和传递变量的方式是否正确。

这是我到目前为止得到的代码:

<?php
require 'aws-autoloader.php';

use Aws\Glacier\GlacierClient;
use Aws\Glacier\TreeHash;

//############################################
//DEFAULT VARIABLES
//############################################
$key = 'XXXXXXXXXXXXXXXXXXXX';
$secret = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';   
$accountId = '123456789123';
$vaultName = 'VaultName';
$partSize = '4194304';
$fileLocation = 'path/to/files/';

//############################################
//DECLARE THE AMAZON CLIENT
//############################################
$client = new GlacierClient([
    'region' => 'us-west-2',
    'version' => '2012-06-01',
    'credentials' => array(
        'key'    => $key,
        'secret' => $secret,
  )
]);

//############################################
//GET THE UPLOAD ID
//############################################
$result = $client->initiateMultipartUpload([
    'partSize' => $partSize,
    'vaultName' => $vaultName
]);
$uploadId = $result['uploadId'];

//############################################
//GET ALL FILES INTO AN ARRAY
//############################################
$files = scandir($fileLocation);
unset($files[0]);
unset($files[1]);
sort($files);

//############################################
//GET SHA256 TREE HASH (CHECKSUM)
//############################################
$th = new TreeHash();
//GET TOTAL FILE SIZE
foreach($files as $part){
    $filesize = filesize($fileLocation.$part);
    $total = $filesize;
    $th = $th->update(file_get_contents($fileLocation.$part));
}
$totalchecksum = $th->complete();

//############################################
//UPLOAD FILES
//############################################
foreach ($files as $key => $part) {
    //HASH CONTENT
    $filesize = filesize($fileLocation.$part);
    $rangeSize = $filesize-1;
    $range = 'bytes 0-'.$rangeSize.'/*';
    $sourcefile = $fileLocation.$part;

    $result = $client->uploadMultipartPart([
        'accountId' => $accountId,
        'checksum' => '',
        'range' => $range,
        'sourceFile' => $sourcefile,
        'uploadId' => $uploadId,
        'vaultName' => $vaultName
    ]);
}

//############################################
//COMPLETE MULTIPART UPLOAD
//############################################
$result = $client->completeMultipartUpload([
    'accountId' => $accountId,
    'archiveSize' => $total,
    'checksum' => $totalchecksum,
    'uploadId' => $uploadId,
    'vaultName' => $vaultName,
]);
?>

似乎正在声明一个新的 Glacier 客户端,并且我确实收到了一个 UploadID,但如果我做得对,其余的我不是 100%。文件需要上传到然后合并的 Amazon Glacier Vault 仍然是空的,我不确定文件是否只会显示 completeMultipartUpload 已成功执行的文件。

运行代码时我还收到以下错误:

致命错误:未捕获的异常 'Aws\Glacier\Exception\GlacierException' 并带有消息“在https://glacier.us-west-2.amazonaws.com/XXXXXXXXXXXX/vaults/XXXXXXXXXX/multipart-uploads/上执行“CompleteMultipartUpload”时出错cTI0Yfk6xBYIQ0V-rhq6AcdHqd3iivRJfyYzK6-NV1yn9GQvJyYCoSrXrrrx4kfyGm6m9PUEAq4M0x6duXm5MD8abn-M"; AWS HTTP 错误:客户端错误:403 InvalidSignatureException(客户端):我们计算的请求签名与您提供的签名不匹配。请检查您的 AWS 秘密访问密钥和签名方法。有关详细信息,请参阅服务文档。此规范字符串请求应该是 'POST /XXXXXXXXXXX/vaults/XXXXXXXXX/multipart-uploads/cTI0Yfk6xBYIQ0V-rhq6AcdHqd3iivRJfyYzK6-NV1yn9GQvJyYCoSrXrrrx4kfyGm6m9PUEAq4M0x6duXm5MD8abn-M 主机:glacier.us-am97-2.amazonaws-zam2x-size日期:20151016T081455Z x-amz-glacier-version:2012-06-01 x-amz-sha256-tree-hash:?[ qiuã°²åÁ¹ý+¤Üª¤ [;K×T host;x-amz-archive-size; x-amz-date;x-amz-glacier-version;x-am 在 /home/XXXXXXXXXXXX/public_html/XXXXXXXXXXXX/Aws/WrappedHttpHandler.php 的第 152 行

有没有更简单的方法可以做到这一点?如果有帮助的话,我也有完整的 SSH 访问权限。

4

3 回答 3

3

我在 PHP SDK V3(第 3 版)中对此进行了管理,并且在我的研究中不断发现这个问题,所以我想我也会发布我的解决方案。使用风险自负,几乎没有错误检查或处理。

<?php
require 'vendor/autoload.php';

use Aws\Glacier\GlacierClient;
use Aws\Glacier\TreeHash;


// Create the glacier client to connect with
$glacier = new GlacierClient(array(
      'profile' => 'default',
      'region' => 'us-east-1',
      'version' => '2012-06-01'
      ));

$fileName = '17mb_test_file';         // this is the file to upload
$chunkSize = 1024 * 1024 * pow(2,2);  // 1 MB times a power of 2
$fileSize = filesize($fileName);      // we will need the file size (in bytes)

// initiate the multipart upload
// it is dangerous to send the filename without escaping it first
$result = $glacier->initiateMultipartUpload(array(
      'archiveDescription' => 'A multipart-upload for file: '.$fileName,
      'partSize' => $chunkSize,
      'vaultName' => 'MyVault'
      ));

// we need the upload ID when uploading the parts
$uploadId = $result['uploadId'];

// we need to generate the SHA256 tree hash
// open the file so we can get a hash from its contents
$fp = fopen($fileName, 'r');
// This class can generate the hash
$th = new TreeHash();
// feed in all of the data
$th->update(fread($fp, $fileSize));
// generate the hash (this comes out as binary data)...
$hash = $th->complete();
// but the API needs hex (thanks). PHP to the rescue!
$hash = bin2hex($hash);

// reset the file position indicator
fseek($fp, 0);

// the part counter
$partNumber = 0;

print("Uploading: '".$fileName
    ."' (".$fileSize." bytes) in "
    .(ceil($fileSize/$chunkSize))." parts...\n");
while ($partNumber * $chunkSize < ($fileSize + 1))
{
  // while we haven't written everything out yet
  // figure out the offset for the first and last byte of this chunk
  $firstByte = $partNumber * $chunkSize;
  // the last byte for this piece is either the last byte in this chunk, or
  // the end of the file, whichever is less
  // (watch for those Obi-Wan errors)
  $lastByte = min((($partNumber + 1) * $chunkSize) - 1, $fileSize - 1);

  // upload the next piece
  $result = $glacier->uploadMultipartPart(array(
        'body' => fread($fp, $chunkSize),  // read the next chunk
        'uploadId' => $uploadId,          // the multipart upload this is for
        'vaultName' => 'MyVault',
        'range' => 'bytes '.$firstByte.'-'.$lastByte.'/*' // weird string
        ));

  // this is where one would check the results for error.
  // This is left as an exercise for the reader ;)

  // onto the next piece
  $partNumber++;
  print("\tpart ".$partNumber." uploaded...\n");
}
print("...done\n");

// and now we can close off this upload
$result = $glacier->completeMultipartUpload(array(
  'archiveSize' => $fileSize,         // the total file size
  'uploadId' => $uploadId,            // the upload id
  'vaultName' => 'MyVault',
  'checksum' => $hash                 // here is where we need the tree hash
));

// this is where one would check the results for error.
// This is left as an exercise for the reader ;)


// get the archive id.
// You will need this to refer to this upload in the future.
$archiveId = $result->get('archiveId');

print("The archive Id is: ".$archiveId."\n");


?>
于 2015-11-11T19:01:49.290 回答
1

我认为您误解了uploadMultipartPart。uploadMultipartPart 表示,您上传 1 个大文件,分多个部分。然后执行 completeMultipartUpload 以标记您已完成上传一个文件。

从您的代码看来,您正在上传多个文件。

您可能实际上不需要使用 uploadMultipartPart

也许您可以使用常规的“uploadArchive”?

参考:

https://blogs.aws.amazon.com/php/post/Tx7PFHT4OJRJ42/Uploading-Archives-to-Amazon-Glacier-from-PHP

于 2015-10-16T17:20:36.170 回答
0

注意:使用 aws-sdk-php v2 上传多部分的解决方案。我认为它可以在 v3 上运行,而对 class 的使用几乎没有变化 TreeHash

感谢Neil Vandermeiden 的片段,我完成了相同的任务,但增加了一些改进。

Neil 只对整个文件进行校验和验证。它有两个可能的问题:

  • 这可能会消耗内存:记住我们正在上传一个大文件;散列它以获得校验和,需要打开它并读取它的所有内容。
  • 我们正在上传多个文件部分:我们在上传某些部分时可能会遇到问题,最终导致 aws 上的文件部分损坏。如果我们计算并验证每个部分的每个校验和,我们就可以防止出现问题。

在下面的代码中,我们计算发送到 aws 的每个文件部分的校验和,并将它们中的每一个与关联的文件部分一起发送到 aws api。

一旦 aws 完成接收上传的部分,它就会执行它的校验和。如果校验和与我们的不匹配,则会引发异常。如果成功,我们确定该部分已成功上传。

<?php
use Aws\Common\Hash\TreeHash;
use Aws\Glacier\GlacierClient;

/**
 * upload a file and store it into aws glacier
 */
class UploadMultipartFileToGlacier
{
    // aws glacier
    private $description;
    private $glacierClient;
    private $glacierConfig;
    /*
     * it's a requirement the part size beingto be (1024 KB * 1024 KB) multiplied by any power of 2 (1MB, 2MB, 4MB, 8MB, and so on)
     * reference: https://docs.aws.amazon.com/aws-sdk-php/v2/api/class-Aws.Glacier.GlacierClient.html#_initiateMultipartUpload
     **/
    private $partSize;

    // file location
    private $filePath;

    private $errorMessage;
    private $executionDate;

    public function __construct($filePath)
    {
        $this->executionDate = date('Y-m-d H:i:s');
        $this->filePath = $filePath;
    
        // AWS Glacier
        $this->glacierConfig = (object) [
            'vaultId' => 'VAULT_NAME',
            'region' => 'REGION',
            'accessKeyId' => 'ACCESS_KEY',
            'secretAccessKey' => 'SECRET_KEY',
        ];

        $this->glacierClient = GlacierClient::factory(array(
            'credentials' => array(
                'key'    => $this->glacierConfig->accessKeyId,
                'secret' => $this->glacierConfig->secretAccessKey,
            ),
            'region' => $this->glacierConfig->region
        ));

        $this->description = sprintf('Upload file %s at %s', $this->filePath, $this->executionDate);

        $this->partSize = 1024 * 1024 * pow(2, 2); // 4 MB
    }

    public function upload()
    {
        list($success, $data) = $this->uploadFileToGlacier();

        if ($success) {
            // todo: tasks to do when file has upload successfuly to aws glacier
        } else {
            // todo: handle error
            // $this->errorMessage contains the exception message
        }
    }

    private function completeMultipartUpload($uploadId, $fileSize, $checksumParts)
    {
        // with all the chechsums of the processed file parts, we can compute the file checksum. It's important to send it as a parameter to the
        // aws api's GlacierClient::completeMultipartUpload. Aws compute on their side the checksum of the uploaded part. If
        // their checksum doesn't match ours, the api throws an exception.
        $checksum = $this->getChecksumFile($checksumParts);

        return $this->glacierClient->completeMultipartUpload([
            'archiveSize' => $fileSize,
            'uploadId' => $uploadId,
            'vaultName' => $this->glacierConfig->vaultId,
            'checksum' => $checksum
        ]);
    }

    private function getChecksumPart($content)
    {
        $treeHash = new TreeHash();
        $mb = 1024 * 1024 * pow(2, 0); // 1 MB (the class TreeHash only allows to process chunks <= 1 MB)
        $buffer = $content;

        while (strlen($buffer) >= $mb) {
            $data = substr($buffer, 0, $mb);
            $buffer = substr($buffer, $mb) ?: '';
            $treeHash->addData($data);
        }
        
        if (strlen($buffer)) {
            $treeHash->addData($buffer);
        }

        return $treeHash->getHash();
    }

    private function getChecksumFile($checksumParts)
    {
        $treeHash = TreeHash::fromChecksums($checksumParts);

        return $treeHash->getHash();
    }

    private function initiateMultipartUpload()
    {
        $result = $this->glacierClient->initiateMultipartUpload([
            'accountId' => '-',
            'vaultName' => $this->glacierConfig->vaultId,
            'archiveDescription' => $this->description,
            'partSize' => $this->partSize,
        ]);

        return $result->get('uploadId');
    }

    private function uploadFileToGlacier()
    {
        $success = true;
        $data = false;

        try {
            $fileSize = filesize($this->filePath);

            $uploadId = $this->initiateMultipartUpload();
            $checksums = $this->uploadMultipartFile($uploadId, $fileSize);
            $model = $this->completeMultipartUpload($uploadId, $fileSize, $checksums);

            $data = (object) [
                'archiveId' => $model->get('archiveId'),
                'executionDate' => $this->executionDate,
                'location' => $model->get('location'),
            ];
        } catch (\Exception $e) {
            $this->errorMessage = $e->getMessage();
            $success = false;
        }

        return [$success, $data];
    }
    
    private function uploadMultipartFile($uploadId, $fileSize)
    {
        $numParts = ceil($fileSize / $this->partSize);
        $fp = fopen($this->filePath, 'r');
        $partIdx = 0;
        $checksumParts = [];

        error_log("Uploading: {$this->filePath} ({$fileSize} bytes) in {$numParts} parts...");

        while ($partIdx * $this->partSize < ($fileSize + 1)) {
            $firstByte = $partIdx * $this->partSize;
            $lastByte = min((($partIdx + 1) * $this->partSize) - 1, $fileSize - 1);
            $content = fread($fp, $this->partSize);
            
            // we compute the checksum of the part we're processing. It's important to send it as a parameter to the
            // aws api's GlacierClient::uploadMultipartPart. Aws compute on their side the checksum of the uploaded part. If
            // their checksum doesn't match ours, the api throws an exception.
            $checksumPart = $this->getChecksumPart($content);

            $result = $this->glacierClient->uploadMultipartPart([
                'body' => $content,
                'uploadId' => $uploadId,
                'vaultName' => $this->glacierConfig->vaultId,
                'checksum' => $checksumPart,
                'range' => "bytes {$firstByte}-{$lastByte}/*"
            ]);

            $checksumParts[] = $result->get('checksum'); // same result as $checksumPart. It throws an exception if doesn't
            
            $partIdx++;
            error_log("Part {$partIdx} uploaded...");
        }

        return $checksumParts;
    }
}

$uploadMultipartFileToGlacier = new UploadMultipartFileToGlacier('<FILE_PATH>');

$uploadMultipartFileToGlacier->upload();
于 2020-12-02T19:22:21.670 回答