我想使用 pyarrow 从 hdfs 读取和写入。
我在我的 Windows 10 64 位系统上安装了 hadoop,如下所示: https ://github.com/MuhammadBilalYar/Hadoop-On-Window/wiki/Step-by-step-Hadoop-2.8.0-installation-on-Window-10 并用 pip 安装了 pyarrow。
但是,如果我想在 python 中连接到 hdfs,我会收到以下错误:
Python 3.5.0 (v3.5.0:374f501f4567, Sep 13 2015, 02:27:37) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
>>> pyarrow.hdfs.connect()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\TIKI_git\ai-core-python\venv\lib\site-packages\pyarrow\hdfs.py", line 183, in connect
extra_conf=extra_conf)
File "C:\TIKI_git\ai-core-python\venv\lib\site-packages\pyarrow\hdfs.py", line 37, in __init__
self._connect(host, port, user, kerb_ticket, driver, extra_conf)
File "pyarrow\io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Unable to load libjvm
我按照http://wesmckinney.com/blog/python-hdfs-interfaces/中的描述检查了我的路径变量
我能做些什么来解决这个问题?甚至可以在 Windows 上使用 pyarrow.hdfs.connect 函数吗?
谢谢你的帮助!