amazon-s3 - 未使用 SinkMode.REPLACE 删除级联 S3 Sink Tap

Question

我们正在运行 Cascading，其中 Sink Tap 被配置为存储在 Amazon S3 中，并且面临一些 FileAlreadyExistsException（请参阅 [1]）。这只是不时的（大约 100 次有 1 次）并且不可重现。

深入研究 Cascading codem，我们发现 Hfs.deleteResource() 被 BaseFlow.deleteSinksIfNotUpdate() 调用（其中包括）。顺便说一句，我们对沉默的 NPE 非常感兴趣（评论“当 fs 到达根目录时，hack to get around npe throw away”）。

从那里，我们用我们自己的 Tap 扩展了 Hfs Tap，以在 deleteResource() 方法中添加更多操作（参见 [2]），并使用直接调用 getFileSystem(conf).delete 的重试机制。

重试机制似乎带来了改进，但我们仍然有时会遇到失败（参见[3]中的示例）：听起来HDFS返回isDeleted=true，但直接询问文件夹是否存在，我们收到exists=true，应该不会发生。当流程成功时，日志还会随机显示 isDeleted true 或 false，这听起来像是返回的值无关紧要或不可信。

任何人都可以通过这样的行为带来他自己的 S3 体验：“文件夹应该被删除，但不是”？我们怀疑是 S3 问题，但它也可能出现在 Cascading 或 HDFS 中吗？

我们在 Hadoop Cloudera-cdh3u5 和 Cascading 2.0.1-wip-dev 上运行。

[1]

org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory s3n://... already exists
    at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132)
    at com.twitter.elephantbird.mapred.output.DeprecatedOutputFormatWrapper.checkOutputSpecs(DeprecatedOutputFormatWrapper.java:75)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:923)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:882)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:882)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:856)
    at cascading.flow.hadoop.planner.HadoopFlowStepJob.internalNonBlockingStart(HadoopFlowStepJob.java:104)
    at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:174)
    at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:137)
    at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:122)
    at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:42)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.j

[2]

  @Override
  public boolean deleteResource(JobConf conf) throws IOException {
    LOGGER.info("Deleting resource {}", getIdentifier());

    boolean isDeleted = super.deleteResource(conf);
    LOGGER.info("Hfs Sink Tap isDeleted is {} for {}", isDeleted,
        getIdentifier());

    Path path = new Path(getIdentifier());

    int retryCount = 0;
    int cumulativeSleepTime = 0;
    int sleepTime = 1000;

    while (getFileSystem(conf).exists(path)) {
      LOGGER
          .info(
              "Resource {} still exists, it should not... - I will continue to wait patiently...",
              getIdentifier());
      try {
        LOGGER.info("Now I will sleep " + sleepTime / 1000
            + " seconds while trying to delete {} - attempt: {}",
            getIdentifier(), retryCount + 1);
        Thread.sleep(sleepTime);
        cumulativeSleepTime += sleepTime;
        sleepTime *= 2;
      } catch (InterruptedException e) {
        e.printStackTrace();
        LOGGER
            .error(
                "Interrupted while sleeping trying to delete {} with message {}...",
                getIdentifier(), e.getMessage());
        throw new RuntimeException(e);
      }

      if (retryCount == 0) {
        getFileSystem(conf).delete(getPath(), true);
      }

      retryCount++;

      if (cumulativeSleepTime > MAXIMUM_TIME_TO_WAIT_TO_DELETE_MS) {
        break;
      }
    }

    if (getFileSystem(conf).exists(path)) {
      LOGGER
          .error(
              "We didn't succeed to delete the resource {}. Throwing now a runtime exception.",
              getIdentifier());
      throw new RuntimeException(
          "Although we waited to delete the resource for "
              + getIdentifier()
              + ' '
              + retryCount
              + " iterations, it still exists - This must be an issue in the underlying storage system.");
    }

    return isDeleted;

  }

[3]

INFO [pool-2-thread-15] (BaseFlow.java:1287) - [...] at least one sink is marked for delete
 INFO [pool-2-thread-15] (BaseFlow.java:1287) - [...] sink oldest modified date: Wed Dec 31 23:59:59 UTC 1969
 INFO [pool-2-thread-15] (HiveSinkTap.java:148) - Now I will sleep 1 seconds while trying to delete s3n://... - attempt: 1
 INFO [pool-2-thread-15] (HiveSinkTap.java:130) - Deleting resource s3n://...
 INFO [pool-2-thread-15] (HiveSinkTap.java:133) - Hfs Sink Tap isDeleted is true for s3n://...
 ERROR [pool-2-thread-15] (HiveSinkTap.java:175) - We didn't succeed to delete the resource s3n://... Throwing now a runtime exception.
 WARN [pool-2-thread-15] (Cascade.java:706) - [...] flow failed: ...
 java.lang.RuntimeException: Although we waited to delete the resource for s3n://... 0 iterations, it still exists - This must be an issue in the underlying storage system.
    at com.qubit.hive.tap.HiveSinkTap.deleteResource(HiveSinkTap.java:179)
    at com.qubit.hive.tap.HiveSinkTap.deleteResource(HiveSinkTap.java:40)
    at cascading.flow.BaseFlow.deleteSinksIfNotUpdate(BaseFlow.java:971)
    at cascading.flow.BaseFlow.prepare(BaseFlow.java:733)
    at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:761)
    at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:710)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)

score 2 · Accepted Answer

首先，仔细检查 Cascading 兼容性页面以获取支持的发行版。

http://www.cascading.org/support/compatibility/

注意列出 Amazon EMR 是因为它们会定期运行兼容性测试并报告结果。

其次，S3 是一个最终一致的文件系统。HDFS 不是。因此，关于 HDFS 行为的假设不会延续到针对 S3 存储数据。例如，重命名实际上是复制和删除。副本可能需要几个小时。亚马逊已经修补了他们的内部分布以适应许多差异。

第三，S3 中没有目录。这是一种 hack，不同的 S3 接口（jets3t vs s3cmd vs ...）支持不同。考虑到前面的观点，这肯定是有问题的。

第四，网络延迟和可靠性至关重要，尤其是在与 S3 通信时。从历史上看，我发现在使用 EMR 与标准 EC2 实例时，在 S3 上处理大量数据集时，Amazon 网络表现得更好。我也相信他们是 EMR 中的一个补丁，也可以改善这里的问题。

所以我建议尝试运行 EMR Apache Hadoop 发行版，看看你的问题是否得到解决。

score 1 · Accepted Answer

在 Hadoop 上运行使用 S3 中的文件的任何作业时，必须牢记最终一致性的细微差别。

我帮助解决了许多应用程序的问题，这些应用程序的根本问题是删除竞争条件相似——无论它们是在级联或 Hadoop 流中，还是直接用 Java 编写。

在给定的键/值对被完全删除后，有一次讨论来自 S3 的通知。我没有跟上该功能的位置。否则，最好设计系统——同样，无论是在 Cascading 中还是在任何其他使用 S3 的应用程序中——使得由批处理工作流使用或生成的数据在 HDFS 或 HBase 或键/值框架中进行管理（例如，为此使用了 Redis）。然后 S3 用于持久存储，但不用于中间数据。

amazon-s3 - 未使用 SinkMode.REPLACE 删除级联 S3 Sink Tap

2 回答 2

Related

Reference