15

虽然@http://highscalability.com/amazon-architecture的帖子总体上解释了 Amazon 的架构,但我有兴趣了解 Amazon S3 是如何实现的。

我的一些猜测是

  1. 像 HDFS 这样的分布式文件系统 http://hadoop.apache.org/core/docs/current/hdfs_design.html
  2. 一个非关系持久数据库,如 CouchDB http://couchdb.apache.org/

是否可以使用 Python 或 PHP 等脚本语言在更小范围内实现类似的功能?

4

3 回答 3

6

Amazon S3 is implemented using the architecture described in the Dynamo Paper:

http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

The paper explains consistent hashing, and how and why the guarantee is "eventual consistency".

The conflict resolution they talk about for Dynamo is not exposed to users of S3. It is used internally in Amazon's applications, but for S3, the only conflict resolution is last write wins.

Edit: Werner Vogels has said "Dynamo is not directly exposed externally as a web service; however, Dynamo and similar Amazon technologies are used to power parts of our Amazon Web Services, such as S3." http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

I would emphasize that he isn't saying S3 and Dynamo share components, he explicitly says that Dynamo itself is one of the technologies that power S3. Everything I've seen from S3, including the caveats, is accounted for by assuming S3 is a fancy web services wrapper around Dynamo with authentication, accounting, and a last-write-wins conflict resolve that is invisible to the user.

The original question was about the underlying storage mechanism for S3. It is explicitly not a distributed file system like HDFS or a non-relational database like CouchDB. Dynamo fills this role.

于 2009-04-07T23:24:46.937 回答
4

Amazon S3 的架构及其实施都尚未公开。因此,它不可用于扩展以开发创建任何规模的私有云的能力。

有几篇关于云存储架构主题的论文。您可能会发现它们很有用。这是一个:CACSS:迈向通用云存储服务

还详细介绍了将不同技术相结合以提供单一性能优良、高度可扩展且可靠的云存储系统的方法。这项研究为缺乏经验的云提供商提供了知识来源,使他们能够快速建立自己的云存储服务

于 2012-05-22T12:27:47.860 回答
1

它更接近于 2,尽管内容存储为“BLOB”,系统不关心内容,而 CouchDB 可以。后端存储对用于存储多个副本的集群节点使用本地数据库(BDB?)。读取可以转到任何具有副本的节点,写入也可以,但是需要解决写入以消除冲突。正如凯文所提到的,这保证了“最终一致性”,但没有严格保证何时或哪个写入获胜(来自外部 POV;内部已定义)。

阅读 Dynamo 文档有助于理解许多概念,但 AFAIK 的实现是不同的。Dynamo 被亚马逊内部用于其他用途。两者都有开源实现;一个有趣的是伏地魔计划。CouchDB 显然也很有趣。

于 2009-04-22T18:27:18.057 回答