4

我在 Azure 上使用 Databricks 笔记本,我有一个非常好的 Pyspark 笔记本,昨天一整天都运行良好。但是在一天结束的时候,我注意到我在我知道以前工作的代码上遇到了一些奇怪的错误:org.apache.spark.SparkException: Job aborted due to stage failure: Task from application

但是因为太晚了,我把它留到了今天。今天我尝试创建一个新的集群并运行代码,这一次它只是一直说我的工作被“取消”了

事实上,我只是尝试运行 1 行代码:

filePath = "/SalesData.csv"

甚至被取消了。

编辑:

这是来自 Azure 的 std 错误日志:

OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
/databricks/python/lib/python3.5/site-packages/IPython/config/loader.py:38: UserWarning: IPython.utils.traitlets has moved to a top-level traitlets package.
  from IPython.utils.traitlets import HasTraits, List, Any, TraitError
Fri Jan  4 16:51:08 2019 py4j imported
Fri Jan  4 16:51:08 2019 Python shell started with PID  2543  and guid  86405138b8744987a1df085e4454bb5d
Could not launch process The 'config' trait of an IPythonShell instance must be a Config, but a value of class 'IPython.config.loader.Config' (i.e. {'HistoryManager': {'hist_file': ':memory:'}, 'HistoryAccessor': {'hist_file': ':memory:'}}) was specified. Traceback (most recent call last):
  File "/tmp/1546620668035-0/PythonShell.py", line 1048, in <module>
    launch_process()
  File "/tmp/1546620668035-0/PythonShell.py", line 1036, in launch_process
    console_buffer, error_buffer)
  File "/tmp/1546620668035-0/PythonShell.py", line 508, in __init__
    self.shell = self.create_shell()
  File "/tmp/1546620668035-0/PythonShell.py", line 617, in create_shell
    ip_shell = IPythonShell.instance(config=config, user_ns=user_ns)
  File "/databricks/python/lib/python3.5/site-packages/traitlets/config/configurable.py", line 412, in instance
    inst = cls(*args, **kwargs)
  File "/databricks/python/lib/python3.5/site-packages/IPython/terminal/embed.py", line 159, in __init__
    super(InteractiveShellEmbed,self).__init__(**kw)
  File "/databricks/python/lib/python3.5/site-packages/IPython/terminal/interactiveshell.py", line 455, in __init__
    super(TerminalInteractiveShell, self).__init__(*args, **kwargs)
  File "/databricks/python/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 622, in __init__
    super(InteractiveShell, self).__init__(**kwargs)
  File "/databricks/python/lib/python3.5/site-packages/traitlets/config/configurable.py", line 84, in __init__
    self.config = config
  File "/databricks/python/lib/python3.5/site-packages/traitlets/traitlets.py", line 583, in __set__
    self.set(obj, value)
  File "/databricks/python/lib/python3.5/site-packages/traitlets/traitlets.py", line 557, in set
    new_value = self._validate(obj, value)
  File "/databricks/python/lib/python3.5/site-packages/traitlets/traitlets.py", line 589, in _validate
    value = self.validate(obj, value)
  File "/databricks/python/lib/python3.5/site-packages/traitlets/traitlets.py", line 1681, in validate
    self.error(obj, value)
  File "/databricks/python/lib/python3.5/site-packages/traitlets/traitlets.py", line 1528, in error
    raise TraitError(e)
traitlets.traitlets.TraitError: The 'config' trait of an IPythonShell instance must be a Config, but a value of class 'IPython.config.loader.Config' (i.e. {'HistoryManager': {'hist_file': ':memory:'}, 'HistoryAccessor': {'hist_file': ':memory:'}}) was specified.
4

4 回答 4

1

我和我的团队在将azureml['notebooks']Python 包安装到我们的集群后遇到了这个问题。安装似乎有效,但我们收到了试图运行代码单元的“已取消”消息。

我们还在我们的日志中收到了一个错误,类似于这篇文章中的错误:

The 'config' trait of an IPythonShell instance must be a Config, 
  but a value of class 'IPython.config.loader.Config'...

似乎某些 Python 包可能与此 Config 对象冲突,或者不兼容。我们卸载了库,重新启动了集群,一切正常。希望这可以帮助某人:)

于 2019-01-17T15:33:11.997 回答
0

安装的 IPython 包版本似乎有问题。为我们解决的问题是降级 IPython 版本:

集群(左窗格)>单击您的集群>库>安装新> PyPi>在“包”字段中,写入:“ipython==3.2.3”>安装

然后重新启动您的集群。

此外,Databricks 似乎还有另一个与 NumPy 包类似的问题,这是在修复 IPython 后发生在我们身上的。如果您也遇到这种情况,请尝试像使用 IPython 一样降级到 numpy==1.15.0。

于 2019-03-25T09:02:01.007 回答
0

好的,我最终创建了另一个新集群,现在它似乎可以工作了。唯一不同的是,在之前的集群中,我将最大节点数设置为 5。这次我将其保留为默认值 8。

但是我不知道这是否真的是造成差异的原因。特别是。鉴于昨天的错误发生在以前工作正常的集群上。或者今天的错误是执行一个非常简单的代码。

于 2019-01-03T12:03:55.717 回答
0

听起来您的集群可能已进入错误状态并需要重新启动。有时,底层 VM 服务也可能出现故障,您需要使用新节点启动新集群。如果您无法执行代码,请务必从重新启动集群开始。

于 2019-01-03T18:21:32.230 回答