8

I'm facing a problem to share a storage between multiple EC2 instances. I'm going to have to run heavy jobs so I'll need a lot of instances to do it. On one side ,I have an EBS volume attached to one server instance. On the other side I have a a worker instance. I created an AMI of this worker instance and then I created several instances copies of this AMI. There are all running on the same VPC. Basically the server instance is sending jobs and the workers are executing the job. I would like to save some log files when my workers are running the jobs, in the share storage something like:

worker_1/ logfile.log

worker_2/ logfile.log

What could be the best solution to do that?

  • I read it's not possible to attach the same EBS volume to multiple instances.
  • I had a look at GlusterFS but here is what I found:

"Before realizing a proof of concept with two servers, in different availability zones, replicating an EBS volume with an ext4 filesystem, we will list the cases where GlusterFS should not be used: Sequential files written simultaneously from multiple servers such as logs. The locking system can lead to serious problems if you store logs within GlusterFS. The ideal solution it’s to store them locally then use S3 to archive them. If necessary we can consolidate multiple server logs before or after storing them in S3."

  • And finally, I've also checked S3 bucket mounted with s3fs but I found out it's not a good option too:

"You can't partially update a file with s3fs so changing a single byte will re-upload the entire file" . Then if you want to make small incremental change then its a definite no. You can't use s3fs - S3 Just doesn't work that way you can't incrementally change a file."

Then what could be a good solution to my problems and allows my workers to write their log files in a share storage?

Thanks for your help!

Romanzo

4

4 回答 4

4

Thanks for the answers. but finally I'm using NFS between the instances and it works pretty well!

于 2013-08-15T05:15:23.797 回答
3

As is described in this thread and in some of the answers already provided, two common ways to accomplish this goal were to use S3 or NFS to share data access between instances.

On April 9th 2015, Amazon announced Amazon Elastic File System (Amazon EFS), which provides a much better solution to the problem you're trying to solve.

于 2015-06-03T18:47:14.063 回答
0

Did you consider the option of having each worker write its logs on a local disk (maybe even on the ephemeral partition), and then make each worker upload its own big log file to S3 after it finishes?

This is somewhat similar to what happens when you use Elastic MapReduce to run some distributed tasks on a Hadoop cluster.

You'd get high write throughput (since it is writing to a local disk, if you use the ephemeral partition), and also high upload throughput to send the files to S3 (since you'd have the bandwidth of many workers available).

于 2013-07-04T04:01:03.290 回答
0

Not entirely sure of the context, but would writing objects directly on mounted S3 be feasible?

于 2013-08-14T04:36:09.620 回答