0

嗨,我在谷歌作曲家(经理气流)内运行造纸厂。我正在使用 PythonVirtualenvOperator 在作曲家中运行 papermill。源笔记本位于谷歌云存储内,我需要存储已执行笔记本的路径也在谷歌云存储内。但是当像这样运行 papermill 时出现错误:Unexpected keyword argument 'min'

以下是代码片段:

def getGCSObjects():
  import papermill as pm
  pm.execute_notebook(
    'gs://BUCKET/inputs/add.ipynb',
    'gs://BUCKET/inputs/add_out.ipynb',
    parameters=dict(alpha=0.6, ratio=0.1)
  )

list_gcs_files = PythonVirtualenvOperator(
  task_id='list_gcs_files',
  system_site_packages=True,
  python_version='3.6',
  requirements=[
   'gcsfs>=0.2.0'
   'papermill',
  ],
  dag=dag,
  python_callable=getGCSObjects,
)

错误输出:

[2021-06-30 09:14:17,905] {taskinstance.py:902} INFO - Executing <Task(PythonVirtualenvOperator): list_gcs_files> on 2021-06-30T00:00:00+00:00
[2021-06-30 09:14:17,905] {taskinstance.py:902} INFO - Executing <Task(PythonVirtualenvOperator): list_gcs_files> on 2021-06-30T00:00:00+00:00
[2021-06-30 09:14:19,489] {python_operator.py:316} INFO - Executing cmd
['virtualenv', '/tmp/venvoyf919ht', '--system-site-packages', '--python=python3.6']
[2021-06-30 09:14:19,828] {python_operator.py:321} INFO - Got output
b'created virtual environment CPython3.6.10.final.0-64 in 235ms\n  creator CPython3Posix(dest=/tmp/venvoyf919ht, clear=False, no_vcs_ignore=False, global=True)\n  seeder FromAppData(download=False, pip=bundle, wheel=bundle, setuptools=bundle, via=copy, app_data_dir=/home/airflow/.local/share/virtualenv)\n    added seed packages: pip==20.2.4, setuptools==50.3.2, wheel==0.35.1\n  activators PythonActivator,FishActivator,XonshActivator,CShellActivator,PowerShellActivator,BashActivator\n'
[2021-06-30 09:14:19,831] {python_operator.py:316} INFO - Executing cmd
['/tmp/venvoyf919ht/bin/pip', 'install', 'gcsfs>=0.2.0papermill']
[2021-06-30 09:14:27,079] {python_operator.py:321} INFO - Got output
b'Requirement already satisfied: gcsfs>=0.2.0papermill in /opt/python3.6/lib/python3.6/site-packages (2021.6.1)\nRequirement already satisfied: aiohttp in /opt/python3.6/lib/python3.6/site-packages (from gcsfs>=0.2.0papermill) (3.7.4.post0)\nRequirement already satisfied: fsspec==2021.06.1 in /opt/python3.6/lib/python3.6/site-packages (from gcsfs>=0.2.0papermill) (2021.6.1)\nRequirement already satisfied: google-auth>=1.2 in /opt/python3.6/lib/python3.6/site-packages (from gcsfs>=0.2.0papermill) (1.24.0)\nRequirement already satisfied: google-auth-oauthlib in /opt/python3.6/lib/python3.6/site-packages (from gcsfs>=0.2.0papermill) (0.4.2)\nRequirement already satisfied: requests in /opt/python3.6/lib/python3.6/site-packages (from gcsfs>=0.2.0papermill) (2.25.0)\nRequirement already satisfied: decorator in /opt/python3.6/lib/python3.6/site-packages (from gcsfs>=0.2.0papermill) (5.0.9)\nRequirement already satisfied: yarl<2.0,>=1.0 in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (1.6.3)\nRequirement already satisfied: chardet<5.0,>=2.0 in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (3.0.4)\nRequirement already satisfied: async-timeout<4.0,>=3.0 in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (3.0.1)\nRequirement already satisfied: typing-extensions>=3.6.5 in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (3.7.4.3)\nRequirement already satisfied: attrs>=17.3.0 in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (20.3.0)\nRequirement already satisfied: idna-ssl>=1.0; python_version < "3.7" in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (1.1.0)\nRequirement already satisfied: multidict<7.0,>=4.5 in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (5.1.0)\nRequirement already satisfied: setuptools>=40.3.0 in /tmp/venvoyf919ht/lib/python3.6/site-packages (from google-auth>=1.2->gcsfs>=0.2.0papermill) (50.3.2)\nRequirement already satisfied: rsa<5,>=3.1.4; python_version >= "3.6" in /opt/python3.6/lib/python3.6/site-packages (from google-auth>=1.2->gcsfs>=0.2.0papermill) (4.6)\nRequirement already satisfied: pyasn1-modules>=0.2.1 in /opt/python3.6/lib/python3.6/site-packages (from google-auth>=1.2->gcsfs>=0.2.0papermill) (0.2.8)\nRequirement already satisfied: cachetools<5.0,>=2.0.0 in /opt/python3.6/lib/python3.6/site-packages (from google-auth>=1.2->gcsfs>=0.2.0papermill) (4.1.1)\nRequirement already satisfied: six>=1.9.0 in /opt/python3.6/lib/python3.6/site-packages (from google-auth>=1.2->gcsfs>=0.2.0papermill) (1.15.0)\nRequirement already satisfied: requests-oauthlib>=0.7.0 in /opt/python3.6/lib/python3.6/site-packages (from google-auth-oauthlib->gcsfs>=0.2.0papermill) (1.3.0)\nRequirement already satisfied: idna<3,>=2.5 in /opt/python3.6/lib/python3.6/site-packages (from requests->gcsfs>=0.2.0papermill) (2.8)\nRequirement already satisfied: certifi>=2017.4.17 in /opt/python3.6/lib/python3.6/site-packages (from requests->gcsfs>=0.2.0papermill) (2020.11.8)\nRequirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/python3.6/lib/python3.6/site-packages (from requests->gcsfs>=0.2.0papermill) (1.25.11)\nRequirement already satisfied: pyasn1>=0.1.3 in /opt/python3.6/lib/python3.6/site-packages (from rsa<5,>=3.1.4; python_version >= "3.6"->google-auth>=1.2->gcsfs>=0.2.0papermill) (0.4.8)\nRequirement already satisfied: oauthlib>=3.0.0 in /opt/python3.6/lib/python3.6/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib->gcsfs>=0.2.0papermill) (3.1.0)\n'
[2021-06-30 09:14:27,200] {python_operator.py:316} INFO - Executing cmd
['/tmp/venvoyf919ht/bin/python', '/tmp/venvoyf919ht/script.py', '/tmp/venvoyf919ht/script.in', '/tmp/venvoyf919ht/script.out', '/tmp/venvoyf919ht/string_args.txt']
[2021-06-30 09:14:28,919] {python_operator.py:323} INFO - Got error output
b'Input notebook does not contain a cell with tag \'parameters\'\n\rExecuting:   0%|          | 0/4 [00:00<?, ?cell/s]Traceback (most recent call last):\n  File "/tmp/venvoyf919ht/script.py", line 16, in <module>\n    res = getGCSObjects(*args, **kwargs)\n  File "/tmp/venvoyf919ht/script.py", line 13, in getGCSObjects\n    parameters=dict(alpha=0.6, ratio=0.1)\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/execute.py", line 118, in execute_notebook\n    **engine_kwargs\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/engines.py", line 49, in execute_notebook_with_engine\n    return self.get_engine(engine_name).execute_notebook(nb, kernel_name, **kwargs)\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/engines.py", line 341, in execute_notebook\n    nb_man.notebook_start()\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/engines.py", line 69, in wrapper\n    return func(self, *args, **kwargs)\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/engines.py", line 198, in notebook_start\n    self.save()\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/engines.py", line 69, in wrapper\n    return func(self, *args, **kwargs)\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/engines.py", line 139, in save\n    write_ipynb(self.nb, self.output_path)\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/iorw.py", line 397, in write_ipynb\n    papermill_io.write(nbformat.writes(nb), path)\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/iorw.py", line 128, in write\n    return self.get_handler(path).write(buf, path)\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/iorw.py", line 316, in write\n    multiplier=self.RETRY_MULTIPLIER, min=self.RETRY_DELAY, max=self.RETRY_MAX_DELAY\nTypeError: __init__() got an unexpected keyword argument \'min\'\n\rExecuting:   0%|          | 0/4 [00:00<?, ?cell/s]\n'
[2021-06-30 09:14:28,970] {taskinstance.py:1152} ERROR - Command '['/tmp/venvoyf919ht/bin/python', '/tmp/venvoyf919ht/script.py', '/tmp/venvoyf919ht/script.in', '/tmp/venvoyf919ht/script.out', '/tmp/venvoyf919ht/string_args.txt']' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/usr/local/lib/airflow/airflow/models/taskinstance.py", line 985, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 113, in execute
    return_value = self.execute_callable()
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 307, in execute_callable
    string_args_filename))
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 319, in _execute_in_subprocess
    close_fds=True)
  File "/opt/python3.6/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/opt/python3.6/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/tmp/venvoyf919ht/bin/python', '/tmp/venvoyf919ht/script.py', '/tmp/venvoyf919ht/script.in', '/tmp/venvoyf919ht/script.out', '/tmp/venvoyf919ht/string_args.txt']' returned non-zero exit status 1.
[2021-06-30 09:14:28,970] {taskinstance.py:1152} ERROR - Command '['/tmp/venvoyf919ht/bin/python', '/tmp/venvoyf919ht/script.py', '/tmp/venvoyf919ht/script.in', '/tmp/venvoyf919ht/script.out', '/tmp/venvoyf919ht/string_args.txt']' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/usr/local/lib/airflow/airflow/models/taskinstance.py", line 985, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 113, in execute
    return_value = self.execute_callable()
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 307, in execute_callable
    string_args_filename))
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 319, in _execute_in_subprocess
    close_fds=True)
  File "/opt/python3.6/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/opt/python3.6/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/tmp/venvoyf919ht/bin/python', '/tmp/venvoyf919ht/script.py', '/tmp/venvoyf919ht/script.in', '/tmp/venvoyf919ht/script.out', '/tmp/venvoyf919ht/string_args.txt']' returned non-zero exit status 1.
[2021-06-30 09:14:28,974] {taskinstance.py:1196} INFO - Marking task as FAILED. dag_id=papermill_run_notebook_v0.1, task_id=list_gcs_files, execution_date=20210630T000000, start_date=20210630T091417, end_date=20210630T091428
[2021-06-30 09:14:28,974] {taskinstance.py:1196} INFO - Marking task as FAILED. dag_id=papermill_run_notebook_v0.1, task_id=list_gcs_files, execution_date=20210630T000000, start_date=20210630T091417, end_date=20210630T091428
Traceback (most recent call last):
  File "/usr/local/bin/airflow", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/usr/local/lib/airflow/airflow/bin/airflow", line 37, in <module>
    args.func(args)
  File "/usr/local/lib/airflow/airflow/utils/cli.py", line 233, in wrapper
    func(args)
  File "/usr/local/lib/airflow/airflow/utils/cli.py", line 81, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/airflow/airflow/bin/cli.py", line 814, in test
    ti.run(ignore_task_deps=True, ignore_ti_state=True, test_mode=True)
  File "/usr/local/lib/airflow/airflow/utils/db.py", line 74, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/airflow/airflow/models/taskinstance.py", line 1109, in run
    session=session)
  File "/usr/local/lib/airflow/airflow/utils/db.py", line 70, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/airflow/airflow/models/taskinstance.py", line 985, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 113, in execute
    return_value = self.execute_callable()
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 307, in execute_callable
    string_args_filename))
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 319, in _execute_in_subprocess
    close_fds=True)
  File "/opt/python3.6/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/opt/python3.6/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/tmp/venvoyf919ht/bin/python', '/tmp/venvoyf919ht/script.py', '/tmp/venvoyf919ht/script.in', '/tmp/venvoyf919ht/script.out', '/tmp/venvoyf919ht/string_args.txt']' returned non-zero exit status 1.

ERROR: (gcloud.composer.environments.run) kubectl returned non-zero status code.

任何帮助将不胜感激,谢谢。

4

1 回答 1

0

我只是有类似的事情发生在我身上。对我来说,错误来自输入和输出笔记本的无效路径。当我在包含我的 DAG 的存储桶中创建一个单独的文件夹并将我的笔记本移到那里时,它起作用了。您应该能够将执行块更改为这样的东西;

pm.execute_notebook(
    r"/home/airflow/gcs/notebooks/notebook.ipynb",
    r"/home/airflow/gcs/notebooks/notebook.ipynb",
    parameters=dict(alpha=0.6, ratio=0.1)

Where/home/airflow/gcs/dags包含你的 DAG,你将创建 notebooks 目录并将你的 notebook 移动到那里。

正如有人评论的那样,这看起来像是 Airflow Error 的副本- 得到了一个意外的关键字参数 'min'。希望这有助于更好地解释它,并解决您的问题

于 2021-07-19T21:20:24.123 回答