pyspark - Sagemaker PySpark：内核死机

Question

我按照此处的说明设置了 EMR 集群和 SageMaker 笔记本。直到最后一步，我没有任何错误。

当我在 Sagemaker 中打开一个新笔记本时，我收到以下消息：

The kernel appears to have died. It will restart automatically.

进而：

        The kernel has died, and the automatic restart has failed.
        It is possible the kernel cannot be restarted. 
        If you are not able to restart the kernel, you will still be able to save the 
notebook, but running code will no longer work until the notebook is reopened.

这只发生在我使用 pyspark/Sparkmagic 内核时。使用 Conda 内核或任何其他内核打开的笔记本工作正常。

我的 EMR 集群完全按照说明进行设置，并添加了一条规则：

[
  {
    "Classification": "spark",
    "Properties": {
      "maximizeResourceAllocation": "true"
    }
  }
]

对于为什么会发生这种情况以及如何调试/修复的任何指示，我将不胜感激。

PS：我过去成功地做到了这一点，没有任何问题。当我今天尝试重新执行此操作时，我遇到了这个问题。我尝试重新创建 EMR 集群和 Sagemaker 笔记本，但这并没有帮助。

score 5 · Accepted Answer

感谢您使用 Amazon SageMaker。

这里的问题是 Pandas 0.23.0 更改了名为 DataError 的核心类的位置，并且 SparkMagic 尚未更新为需要来自正确命名空间的 DataError。

此问题的解决方法是使用pip install pandas==0.22.0.

您可以在这个开放的 github 问题https://github.com/jupyter-incubator/sparkmagic/issues/458中获得更多信息。

让我们知道是否有任何其他方式可以提供帮助。

谢谢，
尼拉姆

pyspark - Sagemaker PySpark：内核死机

1 回答 1

Related

Reference