2

如何在 AWS Glue 中正确安装 PyCaret?

我试过的方法:

我正在使用胶水 2.0 版。我使用--additional-python-modules并设置pycaret为如图所示。 参数设置

然后我得到了这个错误日志。

INFO    2021-07-05 18:12:15,107 18690   com.amazonaws.services.glue.PythonModuleInstaller   [main]  Collecting pycaret  Downloading https://files.pythonhosted.org/packages/da/99/18f151991b0f06107af9723417c64e304ae2133587f85ea734a90136b4ae/pycaret-2.3.1-py3-none-any.whl (261kB)Collecting numpy==1.19.5 (from pycaret)  Downloading https://files.pythonhosted.org/packages/b1/e1/8c4c5632adaffc18dba4e03e97458dc1cb00583811e6982fc620b9d88515/numpy-1.19.5-cp37-cp37m-manylinux1_x86_64.whl (13.4MB)Requirement already satisfied: matplotlib in /home/spark/.local/lib/python3.7/site-packages (from pycaret)Collecting pandas-profiling>=2.8.0 (from pycaret)  Downloading https://files.pythonhosted.org/packages/3b/a3/34519d16e5ebe69bad30c5526deea2c3912634ced7f9b5e6e0bb9dbbd567/pandas_profiling-3.0.0-py2.py3-none-any.whl (248kB)Collecting wordcloud (from pycaret)  Downloading https://files.pythonhosted.org/packages/1b/06/0516bdba2ebdc0d5bd476aa66f94666dd0ad6b9abda723fdf28e451db919/wordcloud-1.8.1-cp37-cp37m-manylinux1_x86_64.whl (366kB)Collecting lightgbm>=2.3.1 (from pycaret)  Downloading https://files.pythonhosted.org/packages/18/b2/fff8370f48549ce223f929fe8cab4ee6bf285a41f86037d91312b48ed95b/lightgbm-3.2.1-py3-none-manylinux1_x86_64.whl (2.0MB)Collecting plotly>=4.4.1 (from pycaret)  Downloading https://files.pythonhosted.org/packages/95/8d/ac1560f7ccc2ace85cd1e9619bbec1975b5d2d92e6c6fdbbdaa994c6ab4d/plotly-5.1.0-py2.py3-none-any.whl (20.6MB)Collecting umap-learn (from pycaret)  Downloading https://files.pythonhosted.org/packages/75/69/85e7f950bb75792ad5d666d86c5f3e62eedbb942848e7e3126513af9999c/umap-learn-0.5.1.tar.gz (80kB)Collecting scikit-plot (from pycaret)  Downloading https://files.pythonhosted.org/packages/7c/47/32520e259340c140a4ad27c1b97050dd3254fdc517b1d59974d47037510e/scikit_plot-0.3.7-py3-none-any.whlCollecting Boruta (from pycaret)  Downloading https://files.pythonhosted.org/packages/b2/11/583f4eac99d802c79af9217e1eff56027742a69e6c866b295cce6a5a8fc2/Boruta-0.3-py3-none-any.whl (56kB)Collecting pyod (from pycaret)  Downloading https://files.pythonhosted.org/packages/71/8a/faa04a753bc32aeef00b9acf8e23d0b914b03844b89dcc6062b28e7ab1c5/pyod-0.9.0.tar.gz (105kB)Collecting yellowbrick>=1.0.1 (from pycaret)  Downloading https://files.pythonhosted.org/packages/3a/15/58feb940b6a2f52d3335cccf9e5d00704ec5ba62782da83f7e2abeca5e4b/yellowbrick-1.3.post1-py3-none-any.whl (271kB)Collecting cufflinks>=0.17.0 (from pycaret)  Downloading https://files.pythonhosted.org/packages/1a/18/4d32edaaf31ba4af9745dac676c4a28c48d3fc539000c29e855bd8db3b86/cufflinks-0.17.3.tar.gz (81kB)Collecting spacy<2.4.0 (from pycaret)  Downloading https://files.pythonhosted.org/packages/79/1c/7c5f7541eb883181b564a8c8ba15d21b2d7b8a38ae32f31763575cf8857d/spacy-2.3.7.tar.gz (5.8MB)    
Complete output from command python setup.py egg_info:    
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-mrzlr566/blis/    
Traceback (most recent call last):      
File "/home/spark/.local/lib/python3.7/site-packages/setuptools/installer.py", line 128, in fetch_build_egg        
subprocess.check_call(cmd)      
File "/usr/lib64/python3.7/subprocess.py", line 363, in check_call        
raise CalledProcessError(retcode, cmd)    
subprocess.CalledProcessError: 
Command '['/usr/bin/python3', '-m', 'pip', '--disable-pip-version-check', 'wheel', '--no-deps', '-w', '/tmp/tmp__iwgkr5', '--quiet', 'blis<0.8.0,>=0.4.0']' returned non-zero exit status 1.        
During handling of the above exception, another exception occurred:        
Traceback (most recent call last):      
File "<string>", line 1, in <module>      
File "/tmp/pip-build-mafqizyu/spacy/setup.py", line 252, in <module>        setup_package()      
File "/tmp/pip-build-mafqizyu/spacy/setup.py", line 247, in setup_package        cmdclass={"build_ext": build_ext_subclass},      
File "/home/spark/.local/lib/python3.7/site-packages/setuptools/__init__.py", line 143, in setup        _install_setup_requires(attrs)      
File "/home/spark/.local/lib/python3.7/site-packages/setuptools/__init__.py", line 138, in _install_setup_requires        
dist.fetch_build_eggs(dist.setup_requires)      
File "/home/spark/.local/lib/python3.7/site-packages/setuptools/dist.py", line 721, in fetch_build_eggs        
replace_conflicting=True,      
File "/home/spark/.local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 783, in resolve        
replace_conflicting=replace_conflicting      
File "/home/spark/.local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1066, in best_match        
return self.obtain(req, installer)      
File "/home/spark/.local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1078, in obtain        
return installer(requirement)      
File "/home/spark/.local/lib/python3.7/site-packages/setuptools/dist.py", line 777, in fetch_build_egg        
return fetch_build_egg(self, req)      
File "/home/spark/.local/lib/python3.7/site-packages/setuptools/installer.py", line 130, in fetch_build_egg        
raise DistutilsError(str(e))    
distutils.errors.DistutilsError: Command '['/usr/bin/python3', '-m', 'pip', '--disable-pip-version-check', 'wheel', '--no-deps', '-w', '/tmp/tmp__iwgkr5', '--quiet', 'blis<0.8.0,>=0.4.0']' 
returned non-zero exit status 1.        ----------------------------------------
INFO    2021-07-05 18:12:15,108 18691   com.amazonaws.services.glue.PythonModuleInstaller   [main]  
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-mafqizyu/spacy/

我试图通过下载 PyCaret 的源代码、从 requirements.txt 中删除 spacy、将源代码打包到 whl 文件中并尝试通过 whl 文件安装 PyCaret 来从依赖项列表中排除 spacy。然后我收到一条错误消息说,Failed building wheel for numba Failed building wheel for llvmlite Failed building wheel

日志:

[truncated because of the limit of number of characters]copying numba/_hashtable.h -> build/lib.linux-x86_64-3.7/numba  copying numba/_typeof.h -> build/lib.linux-x86_64-3.7/numba  copying numba/_devicearray.h -> build/lib.linux-x86_64-3.7/numba  copying numba/_numba_common.h -> build/lib.linux-x86_64-3.7/numba  copying numba/typed/py.typed -> build/lib.linux-x86_64-3.7/numba/typed  copying numba/misc/cmdlang.gdb -> build/lib.linux-x86_64-3.7/numba/misc  copying numba/pycc/modulemixin.c -> build/lib.linux-x86_64-3.7/numba/pycc  copying numba/cext/dictobject.c -> build/lib.linux-x86_64-3.7/numba/cext  copying numba/cext/listobject.c -> build/lib.linux-x86_64-3.7/numba/cext  copying numba/cext/utils.c -> build/lib.linux-x86_64-3.7/numba/cext  copying numba/cext/listobject.h -> build/lib.linux-x86_64-3.7/numba/cext  copying numba/cext/cext.h -> build/lib.linux-x86_64-3.7/numba/cext  copying numba/cext/dictobject.h -> build/lib.linux-x86_64-3.7/numba/cext  copying numba/core/runtime/_nrt_pythonmod.c -> build/lib.linux-x86_64-3.7/numba/core/runtime  copying numba/core/runtime/nrt.c -> build/lib.linux-x86_64-3.7/numba/core/runtime  copying numba/core/runtime/_nrt_python.c -> build/lib.linux-x86_64-3.7/numba/core/runtime  copying numba/core/runtime/nrt.h -> build/lib.linux-x86_64-3.7/numba/core/runtime  copying numba/core/runtime/nrt_external.h -> build/lib.linux-x86_64-3.7/numba/core/runtime  copying numba/core/annotations/template.html -> build/lib.linux-x86_64-3.7/numba/core/annotations  copying numba/cuda/tests/cudadrv/data/jitlink.ptx -> build/lib.linux-x86_64-3.7/numba/cuda/tests/cudadrv/data  running build_ext  building 'numba._dynfunc' extension  Warning: Can't read registry to find the necessary compiler setting  Make sure that Python modules winreg, win32api or win32con are installed.  C compiler: gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC    creating build/temp.linux-x86_64-3.7  creating build/temp.linux-x86_64-3.7/numba  compile options: '-I/usr/include/python3.7m -c'  gcc: numba/_dynfuncmo
d.c  error: Command "gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/python3.7m -c numba/_dynfuncmod.c -o build/temp.linux-x86_64-3.7/numba/_dynfuncmod.o" failed with exit status 127    ----------------------------------------  Running setup.py clean for numba  Running setup.py bdist_wheel for future: started  Running setup.py bdist_wheel for future: finished with status 'done'  Stored in directory: /home/spark/.cache/pip/wheels/8b/99/a0/81daf51dcd359a9377b110a8a886b3895921802d2fc1b2397e  Running setup.py bdist_wheel for sklearn: started  Running setup.py bdist_wheel for sklearn: finished with status 'done'  Stored in directory: /home/spark/.cache/pip/wheels/76/03/bb/589d421d27431bcd2c6da284d5f2286c8e3b2ea3cf1594c074  Running setup.py bdist_wheel for pynndescent: started  Running setup.py bdist_wheel for pynndescent: finished with status 'done'  Stored in directory: /home/spark/.cache/pip/wheels/ba/52/4e/4c28d04d144a28f89e2575fb63628df6e6d49b56c5ddd0c74e  Running setup.py bdist_wheel for htmlmin: started  Running setup.py bdist_wheel for htmlmin: finished with status 'done'  Stored in directory: /home/spark/.cache/pip/wheels/43/07/ac/7c5a9d708d65247ac1f94066cf1db075540b85716c30255459  Running setup.py bdist_wheel for phik: started  Running setup.py bdist_wheel for phik: finished with status 'done'  Stored in directory: /home/spark/.cache/pip/wheels/c0/a3/b0/f27b1cfe32ea131a3715169132ff6d85653789e80e966c3bf6  Running setup.py bdist_wheel for prometheus-flask-exporter: started  Running setup.py bdist_wheel for prometheus-flask-exporter: finished with status 'done'  Stored in directory: /home/spark/.cache/pip/wheels/c0/e2/9c/4f3ee23964802940f81a8b476d0b9be6fb6348cb12df2e2226  Running setup.py bdist_wheel for alembic: started  Running setup.py bdist_wheel for alembic: finished with status 'done'  Stored in directory: /home/spark/.cache/pip/wheels/84/07/f7/12f7370ca47a66030c2edeedcc23dec26ea0ac22dcb4c4a0f3  Running setup.py bdist_wheel for databricks-cli: started  Running setup.py bdist_wheel for databricks-cli: finished with status 'done'  Stored in directory: /home/spark/.cache/pip/wheels/5b/24/f3/34d8e3964dac4ba849d844273c49a679111b00d5799ebb934a  Running setup.py bdist_wheel for llvmlite: started  Running setup.py bdist_wheel for llvmlite: finished with status 'error'  Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-ws60mqho/llvmlite/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpoy9cphk5pip-wheel- --python-tag cp37:  running bdist_wheel  /usr/bin/python3 /tmp/pip-build-ws60mqho/llvmlite/ffi/build.py  LLVM version... Traceback (most recent call last):    File "/tmp/pip-build-ws60mqho/llvmlite/ffi/build.py", line 220, in <module>      main()    File "/tmp/pip-build-ws60mqho/llvmlite/ffi/build.py", line 210, in main      main_posix('linux', '.so')    File "/tmp/pip-build-ws60mqho/llvmlite/ffi/build.py", line 134, in main_posix      raise RuntimeError(msg) from None  RuntimeError: Could not find a `llvm-config` binary. There are a number of reasons this could occur, please see: https://llvmlite.readthedocs.io/en/latest/admin-guide/install.html#using-pip for help.  error: command '/usr/bin/python3' failed with exit status 1    ----------------------------------------  Running setup.py clean for llvmlite  Running setup.py bdist_wheel for bottleneck: started  Running setup.py bdist_wheel for bottleneck: finished with status 'error'  Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-ws60mqho/bottleneck/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpfy0tfce1pip-wheel- --python-tag cp37:  running bdist_wheel  running build  running build_py  creating build  creating build/lib.linux-x86_64-3.7  creating build/lib.linux-x86_64-3.7/bottleneck  copying bottleneck/__init__.py -> build/lib.linux-x86_64-3.7/bottleneck  copying bottleneck/_pytesttester.py -> build/lib.linux-x86_64-3.7/bottleneck  copying bottleneck/_version.py -> build/lib.linux-x86_64-3.7/bottleneck  creating build/lib.linux-x86_64-3.7/bottleneck/tests  copying bottleneck/tests/__init__.py -> build/lib.linux-x86_64-3.7/bottleneck/tests  copying bottleneck/tests/util.py -> build/lib.linux-x86_64-3.7/bottleneck/tests  copying bottleneck/tests/input_modification_test.py -> build/lib.linux-x86_64-3.7/bottleneck/tests  copying bottleneck/tests/list_input_test.py -> build/lib.linux-x86_64-3.7/bottleneck/tests  copying bottleneck/tests/move_test.py -> build/lib.linux-x86_64-3.7/bottleneck/tests  copying bottleneck/tests/nonreduce_test.py -> build/lib.linux-x86_64-3.7/bottleneck/tests  copying bottleneck/tests/reduce_test.py -> build/lib.linux-x86_64-3.7/bottleneck/tests  copying bottleneck/tests/scalar_input_test.py -> build/lib.linux-x86_64-3.7/bottleneck/tests  copying bottleneck/tests/memory_test.py -> build/lib.linux-x86_64-3.7/bottleneck/tests  copying bottleneck/tests/nonreduce_axis_test.py -> build/lib.linux-x86_64-3.7/bottleneck/tests  creating build/lib.linux-x86_64-3.7/bottleneck/src  copying bottleneck/src/bn_config.py -> build/lib.linux-x86_64-3.7/bottleneck/src  copying bottleneck/src/__init__.py -> build/lib.linux-x86_64-3.7/bottleneck/src  copying bottleneck/src/bn_template.py -> build/lib.linux-x86_64-3.7/bottleneck/src  creating build/lib.linux-x86_64-3.7/bottleneck/benchmark  copying bottleneck/benchmark/bench.py -> build/lib.linux-x86_64-3.7/bottleneck/benchmark  copying bottleneck/benchmark/autotimeit.py -> build/lib.linux-x86_64-3.7/bottleneck/benchmark  copying bottleneck/benchmark/__init__.py -> build/lib.linux-x86_64-3.7/bottleneck/benchmark  copying bottleneck/benchmark/bench_detailed.py -> build/lib.linux-x86_64-3.7/bottleneck/benchmark  creating build/lib.linux-x86_64-3.7/bottleneck/slow  copying bottleneck/slow/nonreduce.py -> build/lib.linux-x86_64-3.7/bottleneck/slow  copying bottleneck/slow/move.py -> build/lib.linux-x86_64-3.7/bottleneck/slow  copying bottleneck/slow/__init__.py -> build/lib.linux-x86_64-3.7/bottleneck/slow  copying bottleneck/slow/reduce.py -> build/lib.linux-x86_64-3.7/bottleneck/slow  copying bottleneck/slow/nonreduce_axis.py -> build/lib.linux-x86_64-3.7/bottleneck/slow  UPDATING build/lib.linux-x86_64-3.7/bottleneck/_version.py  set build/lib.linux-x86_64-3.7/bottleneck/_version.py to '1.3.2'  running build_ext  running config  compiling '_configtest.c':    #pragma GCC diagnostic error "-Wattributes"    int __attribute__((optimize("O3"))) have_attribute_optimize_opt_3(void*);    int main(void)  {      return 0;  }    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -c _configtest.c -o _configtest.o  unable to execute 'gcc': No such file or directory  failure.  removing: _configtest.c _configtest.o  compiling '_configtest.c':    #ifndef __cplusplus  static inline int static_func (void)  {      return 0;  }  inline int nostatic_func (void)  {      return 0;  }  #endif  int main(void) {      int r1 = static_func();      int r2 = nostatic_func();      return r1 + r2;  }    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -c _configtest.c -o _configtest.o  unable to execute 'gcc': No such file or directory  failure.  removing: _configtest.c _configtest.o  compiling '_configtest.c':    #ifndef __cplusplus  static __inline__ int static_func (void)  {      return 0;  }  __inline__ int nostatic_func (void)  {      return 0;  }  #endif  int main(void) {      int r1 = static_func();      int r2 = nostatic_func();      return r1 + r2;  }    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -c _configtest.c -o _configtest.o  unable to execute 'gcc': No such file or directory  failure.  removing: _configtest.c _configtest.o  compiling '_configtest.c':    #ifndef __cplusplus  static __inline int static_func (void)  {      return 0;  }  __inline int nostatic_func (void)  {      return 0;  }  #endif  int main(void) {      int r1 = static_func();      int r2 = nostatic_func();      return r1 + r2;  }    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -c _configtest.c -o _configtest.o  unable to execute 'gcc': No such file or directory  failure.  removing: _configtest.c _configtest.o  building 'bottleneck.reduce' extension  creating build/temp.linux-x86_64-3.7  creating build/temp.linux-x86_64-3.7/bottleneck  creating build/temp.linux-x86_64-3.7/bottleneck/src  gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/home/spark/.local/lib/python3.7/site-packages/numpy/core/include -I/usr/include/python3.7m -Ibottleneck/src -c bottleneck/src/reduce.c -o build/temp.linux-x86_64-3.7/bottleneck/src/reduce.o -O2  unable to execute 'gcc': No such file or directory  error: command 'gcc' failed with exit status 1    ----------------------------------------  Running setup.py clean for bottleneck  Running setup.py bdist_wheel for pandocfilters: started  Running setup.py bdist_wheel for pandocfilters: finished with status 'done'  Stored in directory: /home/spark/.cache/pip/wheels/93/9a/79/b2c3567908fd6209e4674ca23d9fcf005aae5fe89148913727Successfully built pyod pyLDAvis cufflinks umap-learn future sklearn pynndescent htmlmin phik prometheus-flask-exporter alembic databricks-cli pandocfiltersFailed to build numba llvmlite bottleneckInstalling collected packages: tenacity, plotly, numpy, threadpoolctl, scikit-learn, mlxtend, jupyterlab-widgets, webencodings, packaging, bleach, mistune, ipython-genutils, traitlets, pygments, jupyter-core, testpath, entrypoints, pyrsistent, zipp, typing-extensions, importlib-metadata, attrs, jsonschema, nbformat, nest-asyncio, async-generator, tornado, pyzmq, jupyter-client, nbclient, pandocfilters, defusedxml, jupyterlab-pygments, MarkupSafe, jinja2, nbconvert, ptyprocess, terminado, pickleshare, backcall, pexpect, matplotlib-inline, parso, jedi, wcwidth, prompt-toolkit, decorator, IPython, debugpy, ipykernel, prometheus-client, Send2Trash, pycparser, cffi, argon2-cffi, notebook, widgetsnbextension, ipywidgets, llvmlite, numba, pyod, lightgbm, scikit-plot, smart-open, gensim, numexpr, future, funcy, sklearn, pyLDAvis, colorlover, cufflinks, yellowbrick, Boruta, pynndescent, umap-learn, textblob, pillow, wordcloud, seaborn, requests, htmlmin, phik, pydantic, networkx, bottleneck, tangled-up-in-unicode, multimethod, PyWavelets, imagehash, visions, missingno, pandas-profiling, kmodes, imbalanced-learn, querystring-parser, greenlet, sqlalchemy, cloudpickle, gunicorn, smmap, gitdb, gitpython, protobuf, Werkzeug, itsdangerous, Flask, prometheus-flask-exporter, Mako, python-editor, alembic, sqlparse, websocket-client, docker, tabulate, databricks-cli, mlflow, pycaret  Found existing installation: numpy 1.18.1    Uninstalling numpy-1.18.1:      Successfully uninstalled numpy-1.18.1  Found existing installation: scikit-learn 0.22.1    Uninstalling scikit-learn-0.22.1:      Successfully uninstalled scikit-learn-0.22.1  Running setup.py install for llvmlite: started    Running setup.py install for llvmlite: finished with status 'error'    Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-ws60mqho/llvmlite/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-r7xtmu3s-record/install-record.txt --single-version-externally-managed --compile --user --prefix=:    running install    running build    got version from file /tmp/pip-build-ws60mqho/llvmlite/llvmlite/_version.py {'version': '0.36.0', 'full': 'e6bb8d137d922bec8beeb01a237254778759becd'}    running build_ext    /usr/bin/python3 /tmp/pip-build-ws60mqho/llvmlite/ffi/build.py    LLVM version... Traceback (most recent call last):      File "/tmp/pip-build-ws60mqho/llvmlite/ffi/build.py", line 220, in <module>        main()      File "/tmp/pip-build-ws60mqho/llvmlite/ffi/build.py", line 210, in main        main_posix('linux', '.so')      File "/tmp/pip-build-ws60mqho/llvmlite/ffi/build.py", line 134, in main_posix        raise RuntimeError(msg) from None    RuntimeError: Could not find a `llvm-config` binary. There are a number of reasons this could occur, please see: https://llvmlite.readthedocs.io/en/latest/admin-guide/install.html#using-pip for help.    error: command '/usr/bin/python3' failed with exit status 1        ----------------------------------------
INFO    2021-07-05 17:36:34,742 81650   com.amazonaws.services.glue.PythonModuleInstaller   [main]    Failed building wheel for numba  Failed building wheel for llvmlite  Failed building wheel for bottleneckCommand "/usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-ws60mqho/llvmlite/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-r7xtmu3s-record/install-record.txt --single-version-externally-managed --compile --user --prefix=" failed with error code 1 in /tmp/pip-build-ws60mqho/llvmlite/

我尝试通过设置 Python 库路径来安装 PyCaret,如下图所示。它不能很好地工作,因为通过 Python 库路径安装 Python 模块不会自动安装依赖项。我试图提供 PyCaret whl 文件及其依赖文件的路径。它一直要求我提供 PyCaret 的 requirements.txt 文件中未列出的 whl 文件。所以我停止了尝试。

在此处输入图像描述

我已经检查过的资源:

我现在花了很多时间。我不知道如何解决我的问题。任何建议或帮助将不胜感激。

4

2 回答 2

1

[编辑]

我无法在 AWS Glue 中导入 PyCaret。我按照以下回复中描述的说明进行操作,但没有奏效。我收到错误消息说ImportError: Missing optional dependency 'jinja2'. DataFrame.style requires jinja2. Use pip or conda to install jinja2

如果有人找到解决此问题的适当解决方案,请告诉我。


我联系了 AWS 支持。梅加娜负责这个案子。

这是回复:

I [Meghana] was able to successfully replicate the error at my end as well using the below steps. Also, please find the workaround mentioned below to avoid the same.

- Create a Glue 2.0 job and configure the Job Parameters as below
===
-- additional-python-modules  pycaret==2.3.2,spacy==3.0.6 
===

On running the Glue job I see that it tried to a Pip3 install, Please find the same below 
===
 INFO   2021-07-07 05:23:15,813 0   com.amazonaws.services.glue.PythonModuleInstaller   [main]  pip3 install  --user pycaret==2.3.2 spacy==3.0.6 
===

However, pip install can be called against a pypi package, local project or wheel hosted via HTTPS we can use this functionality to install packages publicly hosted on pypi as well as those not available publicly available. This gives us the ability to install the majority of packages to use with Glue, including those which are c-based. There is however a subset that will fail (like spacy); packages that require root privileges to install or to be compiled during installation will fail. Glue does not give root access and there is no exception made for package installation. What can be done in this case is pre-compiling the binaries into a wheel compatible with Glue and installing that wheel. 

Please find the below error that I have observed while installing the above “pycaret==2.3.2,spacy==3.0.6” and import them in glue job failed as the compilation failed while replicating.
==
INFO    2021-07-07 05:23:15,813 0   com.amazonaws.services.glue.PythonModuleInstaller   [main]  pip3 install  --user pycaret==2.3.2 spacy==3.0.6 
INFO    2021-07-07 05:23:26,751 10938   com.amazonaws.services.glue.PythonModuleInstaller   [main]  Collecting pycaret==2.3.2  Downloading https://files.pythonhosted.org/packages/bc/b6/9d620a23a038b3abdc249472ffd9be217f6b1877d2d952bfb3f653622a28/pycaret-2.3.2-py3-none-any.whl  (263kB)Collecting spacy==3.0.6  Downloading https://files.pythonhosted.org/packages/6d/0d/4379e9aa35a444b6440ffe1af4c612533460e0d5ac5c7dca1f96ff6f2e23/spacy-3.0.6.tar.gz  (7.1MB)    Complete output from command python setup.py egg_info:        Error compiling Cython file:    ------------------------------------------------------------    ...    from libc.stdint cimport int64_t    from libcpp.vector cimport vector    from libcpp.set cimport set    from cymem.cymem cimport Pool    ^    ------------------------------------------------------------        spacy/strings.pxd:4:0: 'cymem/cymem.pxd' not found        Error compiling Cython file:    ------------------------------------------------------------    ...    from libc.stdint cimport int64_t    from libcpp.vector cimport vector    from libcpp.set cimport set    from cymem.cymem cimport Pool    ^    ------------------------------------------------------------        spacy/strings.pxd:4:0: 'cymem/…
…
..
INFO    2021-07-07 05:23:26,753 10940   com.amazonaws.services.glue.PythonModuleInstaller   [main]  Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-ahnegkpn/spacy/
==

Thus, the importing of the above failed during compilation stage. The error that you have received below is due the python version mismatch. Here the tar file spacy-2.3.7.tar.gz  is compiled using python 3.7 and the Glue job is using 3.6 and hence it failed with the below error. However, even if you provide as mentioned above it still has c-dependencies and stills fails. Please find the work-around to avoid the same.
—
INFO    2021-07-05 17:01:53,986 17142   com.amazonaws.services.glue.PythonModuleInstaller   [main]  Collecting setuptools  Downloading …

7c5f7541eb883181b564a8c8ba15d21b2d7b8a38ae32f31763575cf8857d/spacy-2.3.7.tar.gz (5.8MB)  
  Complete output from command python setup.py egg_info:    Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-_e6ejo9m/blis/    Traceback (most recent call last):      File "/home/spark/.local/lib/python3.7/site-packages/setuptools/installer.py", line 128, in fetch_build_egg        subprocess.check_call(cmd)      File "/usr/lib64/python3.7/subprocess.py", line 363, in check_call        raise CalledProcessError(retcode, cmd)    subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'pip', '--disable-pip-version-check', 'wheel', '--no-deps', '-w', '/tmp/tmpo5fqe7du', '--quiet', 'blis<0.8.0,>=0.4.0']' returned non-zero exit status 1.        During handling of the above exception, another exception occurred:        Traceback (most recent call last):      File "<string>", line 1, in <module>      File "/tmp/pip-build-xoyv9lar/spacy/setup.py", line 252, in <module>        setup_package()      File "/tmp/pip-build-xoyv9lar/spacy/setup.py", line 247, in setup_package        cmdclass={"build_ext": build_ext_subclass},      File "/home/spark/.local/lib/…

      File "/home/spark/.local/lib/python3.7/site-packages/setuptools/installer.py", line 130, in fetch_build_egg        raise DistutilsError(str(e))    distutils.errors.DistutilsError: Command '['/usr/bin/python3', '-m', 'pip', '--disable-pip-version-check', 'wheel', '--no-deps', '-w', '/tmp/tmpo5fqe7du', '--quiet', 'blis<0.8.0,>=0.4.0']' returned non-zero exit status 1.        ----------------------------------------
—

###Workaround 

- To compile a library in a C-based language, the compiler must know the target OS and processor architecture. 
- If the library is compiled against a different OS or processor architecture, then the wheel will fail to install in Glue. 
- Because Glue is a manages service, we do not give users cluster-access to develop these dependencies. 
- Below I will walk you through using a Docker image to prepare an environment you can use to compile wheels that will be compatible with Glue. For this example we will be compiling pycaret,spacy which requires GCC to be installed on the target device as root. 

Step-1: Launch an m5.xlarge EC2 instance with Amazon Linux (2) and enough volume space for your libs. 

Step-2:  Install Docker on the instance, set up nonsudo access, and start it
    1. sudo yum install docker -y
    2. sudo usermod -a -G docker ec2-user
    3.  sudo service docker start

Step-3: Create a Dockerfile as below 
$ vi docfile
—
# Base for Glue
FROM amazonlinux
RUN yum update -y
RUN yum install shadow-utils.x86_64 -y
RUN yum install -y java-1.8.0-openjdk.x86_64
RUN yum install -y python3
RUN yum install -y gcc autoconf automake libtool zlib-devel openssl-devel maven wget protobuf-compiler cmake make gcc-c++

# Additonal Components needed for psutil
WORKDIR /root
RUN yum install python3-devel -y
RUN yum install python-devel -y
RUN pip3 install wheel

# Install psutil
RUN pip3 install pycaret
RUN pip3 install spacy

# Create a directory for the wheel
RUN mkdir wheel_dir 

# create the wheel
RUN pip3 wheel pycaret -w wheel_dir
RUN pip3 wheel spacy -w wheel_dir
—

Step-4: Run docker build to build your Dockerfile 
==
restart the docker daemon 
[ec2-user@ip-xxx ~]$ sudo service docker restart
[ec2-user@ip-xxx ~]$ docker build -f docfile .

[ec2-user@ip-xxx ~]$ docker build -f docfile .
Sending build context to Docker daemon  16.38kB
Step 1/17 : FROM amazonlinux
 ---> 7443854fbdb0
Step 2/17 : RUN yum update -y
 ---> Using cache
…
Removing intermediate container xxx
 ---> xxx
Successfully built xx
==

[ec2-user@ip-xxx~]$ docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

[ec2-user@ip-xxx~]$ docker image ls
REPOSITORY          TAG       IMAGE ID       CREATED          SIZE
<none>              <none>    xxx   24 seconds ago   4.28GB
<none>              <none>    xxx   43 minutes ago   3.94GB

[ec2-user@ip-xxx ~]$ docker ps -a
CONTAINER ID   IMAGE               COMMAND                  CREATED          STATUS                      PORTS     NAMES

Xxx                  yyy        "/bin/bash"              37 minutes ago   Exited (0) 37 minutes ago             brave_meninsky


Step-5 : Extract the whl from the docker container
    - Get the container ID
docker ps (get container ID)
    - Run the container and keep it from exiting
docker run -dite <image>
    - Verify the location of the wheel file (and get it’s filename)
 docker exec -t -i <container_id> ls /root/wheel_dir/
    - Copy the wheel out of docker to EC2
 docker cp <containerID>:/root/wheel_dir/<wheelFile> .
===
[ec2-user@ip-xxx ~]$ docker run -dite e0e1f71b8fad
9f0b1aff06dd959f3744edd3804512e73b68aaeef178962f2c0c063b290dbf78

[ec2-user@ip-xxx~]$ docker ps
CONTAINER ID   IMAGE          COMMAND       CREATED          STATUS          PORTS     NAMES
9f0b1aff06dd   e0e1f71b8fad   "/bin/bash"   49 seconds ago   Up 48 seconds             quirky_bose
[ec2-user@ip-xxx ~]$ docker ps
CONTAINER ID   IMAGE          COMMAND       CREATED         STATUS         PORTS     NAMES
52ab3413c962   e0e1f71b8fad   "/bin/bash"   3 seconds ago   Up 2 seconds             fervent_williamson
9f0b1aff06dd   e0e1f71b8fad   "/bin/bash"   2 minutes ago   Up 2 minutes             quirky_bose
[ec2-user@ip-xxx~]$ docker exec -t -i 52ab3413c962 ls /root/wheel_dir/
Boruta-0.3-py3-none-any.whl
Bottleneck-1.3.2-cp37-cp37m-linux_x86_64.whl
…
[ec2-user@ip-xxx ~]$ aws s3 cp pycaret-2.3.2-py3-none-any.whl  s3://cxx/
upload: ./pycaret-2.3.2-py3-none-any.whl to s3://cxx/pycaret-2.3.2-py3-none-any.whl

===

Step-6: Upload the wheel to S3
aws s3 cp <wheelFile> s3://path/to/wheel/

Step-7: Pass the S3 URL to Glue

Edit your Job
Expand “Security configuration, script libraries, and job parameters (optional)”
In “Job parameters enter a new Key-Value pair
--
Key --additional-python-modules
Value. <s3URI>
--
于 2021-07-08T17:01:07.477 回答
0

不确定,如果仍然与您相关(但可能与某人相关),但我们在使用 Glue 2.0 时遇到了类似的问题。但是切换/升级到 Glue 3.0,我们可以成功安装需要 gcc 来编译依赖项的包。

我们使用--additional-python-modulesjob 属性来指定所需的依赖项。

于 2021-10-15T11:25:24.753 回答