当我远程运行我的数据管道时会引发 PicklingError:数据管道是使用 Beam SDK for Python 编写的,并且我在 Google Cloud Dataflow 之上运行它。当我在本地运行管道时,它工作正常。
以下代码生成 PicklingError:这应该会重现问题
import apache_beam as beam
from apache_beam.transforms import pvalue
from apache_beam.io.fileio import _CompressionType
from apache_beam.utils.options import PipelineOptions
from apache_beam.utils.options import GoogleCloudOptions
from apache_beam.utils.options import SetupOptions
from apache_beam.utils.options import StandardOptions
if __name__ == "__main__":
pipeline_options = PipelineOptions()
pipeline_options.view_as(StandardOptions).runner = 'BlockingDataflowPipelineRunner'
pipeline_options.view_as(SetupOptions).save_main_session = True
google_cloud_options = pipeline_options.view_as(GoogleCloudOptions)
google_cloud_options.project = "project-name"
google_cloud_options.job_name = "job-name"
google_cloud_options.staging_location = 'gs://path/to/bucket/staging'
google_cloud_options.temp_location = 'gs://path/to/bucket/temp'
p = beam.Pipeline(options=pipeline_options)
p.run()
以下是 Traceback 开头和结尾的示例:
WARNING: Could not acquire lock C:\Users\ghousains\AppData\Roaming\gcloud\credentials.lock in 0 seconds
WARNING: The credentials file (C:\Users\ghousains\AppData\Roaming\gcloud\credentials) is not writable. Opening in read-only mode. Any refreshed credentials will only be valid for this run.
Traceback (most recent call last):
File "formatter_debug.py", line 133, in <module>
p.run()
File "C:\Miniconda3\envs\beam\lib\site-packages\apache_beam\pipeline.py", line 159, in run
return self.runner.run(self)
....
....
....
File "C:\Miniconda3\envs\beam\lib\sitepackages\apache_beam\runners\dataflow_runner.py", line 172, in run
self.dataflow_client.create_job(self.job))
StockPickler.save_global(pickler, obj)
File "C:\Miniconda3\envs\beam\lib\pickle.py", line 754, in save_global (obj, module, name))
pickle.PicklingError: Can't pickle <class 'apache_beam.internal.clients.dataflow.dataflow_v1b3_messages.TypeValueValuesEnum'>: it's not found as apache_beam.internal.clients.dataflow.dataflow_v1b3_messages.TypeValueValuesEnum