尝试使用locate -l 1 libhdfs.so
. 就我而言,该文件位于/opt/mapr/hadoop/hadoop-0.20.2/c++/Linux-amd64-64/lib
.
ARROW_LIBHDFS_DIR
然后,使用设置为此路径的环境变量重新启动 Jupyter 服务器。就我而言,我的命令如下所示:
ARROW_LIBHDFS_DIR=/opt/mapr/hadoop/hadoop-0.20.2/c++/Linux-amd64-64/lib jupyter lab --port 2250 --no-browser
接下来,当您创建 Yarn Cluster 时,将此变量作为 worker 参数传递:
# Create a cluster where each worker has two cores and eight GiB of memory
cluster = YarnCluster(
worker_env={
# See https://github.com/dask/dask-yarn/pull/30#issuecomment-434001858
'ARROW_LIBHDFS_DIR': '/opt/mapr/hadoop/hadoop-0.20.2/c++/Linux-amd64-64/lib',
},
)
这为我解决了这个问题。
(灵感来自https://gist.github.com/priancho/357022fbe63fae8b097a563e43dd885b)