尝试使用 pyarrow 访问 hdfs 文件但无法正常工作,下面是代码,非常感谢您。
[rxie@cedgedev03 code]$ python
Python 2.7.12 |Anaconda 4.2.0 (64-bit)| (default, Jul 2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
import pyarrow
import os
os.environ["JAVA_HOME"]="/usr/java/jdk1.8.0_121"
from pyarrow import hdfs
fs = hdfs.connect()
回溯(最后一次调用):文件“”,第 1 行,文件“/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/pyarrow/hdfs.py”,第 183 行,连接 extra_conf =extra_conf) 文件“/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/pyarrow/hdfs.py”,第 37 行,在 init self._connect(host, port, user, kerb_ticket, driver, extra_conf)文件“pyarrow/io-hdfs.pxi”,第 89 行,在 pyarrow.lib.HadoopFileSystem._connect 文件“pyarrow/error.pxi”,第 83 行,在 pyarrow.lib.check_status pyarrow.lib.ArrowIOError:无法加载 libhdfs