google-cloud-platform - 运行 Google Dataflow 的专用 IP（Apache Beam 作业）

Question

我们在 Google Dataflow 环境中使用 Python SDK for apache Beam。这个工具很棒，但是我们担心这些工作的隐私问题，因为它看起来像是使用公共 IP 来运行工作人员。我们的问题是：

即使我们指定了网络和子网，我们是否还要担心使用公共 IPS？
限制公共 IP 的性能和安全性究竟有什么区别？
我们如何设置 Dataflow 以在私有 IP 上创建所有工作人员？从理论上讲，在以下模板中，我们将流程设置为不允许该行为（仍然可以）！根据文档：

我们的工作模板如下所示：

options = PipelineOptions(flags = ['--requirements_file', './requirements.txt'])

#GoogleCloud options

google_cloud_options = options.view_as(GoogleCloudOptions)

google_cloud_options.project = PROJECT

google_cloud_options.job_name = job_name

google_cloud_options.staging_location = 'gs://{}/staging'.format(BUCKET)

google_cloud_options.temp_location = 'gs://{}/temp'.format(BUCKET)

google_cloud_options.region = REGION


#Worker options

worker_options = options.view_as(WorkerOptions)

worker_options.subnetwork = NETWORK

worker_options.max_num_workers = 25


options.view_as(StandardOptions).runner = RUNNER




 ### Note that we specified worker_options.subnetwork with our personal subnetwork. However, once we run our job it still looks like it creates workers on public ips.


### The code runs like this in the end

p = beam.Pipeline(options = options)

...

...

...

run = p.run()

run.wait_until_finish()

谢谢！

score 2 · Accepted Answer

您还需要传递--no_use_public_ips选项，请参阅https://cloud.google.com/dataflow/docs/guides/specifying-networks#python

google-cloud-platform - 运行 Google Dataflow 的专用 IP（Apache Beam 作业）

1 回答 1

Related

Reference