1

我正在尝试将 pyarrow Filesystem 接口与 HDFS 一起使用。我在调用 fs.HadoopFileSystem 构造函数时收到 libhdfs.so not found 错误,即使 libhdfs.so 显然位于指定位置。

from pyarrow import fs
hfs = fs.HadoopFileSystem(host="10.10.0.167", port=9870)

OSError: Unable to load libhdfs: /hadoop-3.3.1/lib/native/libhdfs.so: cannot open shared object file: No such file or directory

我尝试了不同的 python 和 pyarrow 版本并设置了 ARROW_LIBHDFS_DIR。为了测试,我在 linuxmint 上使用以下 dockerfile。

FROM openjdk:11

RUN apt-get update &&\
  apt-get install wget -y

RUN wget -nv https://dlcdn.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1-aarch64.tar.gz &&\
  tar -xf hadoop-3.3.1-aarch64.tar.gz

ENV PATH=/miniconda/bin:${PATH}
RUN wget -nv https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh &&\
  bash miniconda.sh -b -p /miniconda &&\
  conda init 

RUN conda install -c conda-forge python=3.9.6
RUN conda install -c conda-forge pyarrow=4.0.1

ENV JAVA_HOME=/usr/local/openjdk-11
ENV HADOOP_HOME=/hadoop-3.3.1  

RUN  printf 'from pyarrow import fs\nhfs = fs.HadoopFileSystem(host="10.10.0.167", port=9870)\n' > test_arrow.py

# 'python test_arrow.py' fails with ... 
# OSError: Unable to load libhdfs: /hadoop-3.3.1/lib/native/libhdfs.so: cannot open shared object file: No such file or directory
RUN python test_arrow.py || true

CMD ["/bin/bash"]
4

0 回答 0