google-cloud-dataflow - 无法从 GCP Dataflow 连接到 GCP Memorystore

Question

我正在尝试使用 GCP Memorystore 来处理在 GCP Dataflow 上运行的事件流作业的会话 ID。尝试连接到 Memorystore 时，作业因超时而失败：

redis.clients.jedis.exceptions.JedisConnectionException: Failed connecting to host 10.0.0.4:6379
    at redis.clients.jedis.Connection.connect(Connection.java:207)
    at redis.clients.jedis.BinaryClient.connect(BinaryClient.java:101)
    at redis.clients.jedis.Connection.sendCommand(Connection.java:126)
    at redis.clients.jedis.Connection.sendCommand(Connection.java:117)
    at redis.clients.jedis.Jedis.get(Jedis.java:155)

我的 Memorystore 实例具有以下属性：

Version is 4.0
Authorized network is default-auto
Master is in us-central1-b. Replica is in us-central1-a.
Connection properties: IP address: 10.0.0.4, Port number: 6379 
> gcloud redis instances list --region us-central1
INSTANCE_NAME  VERSION    REGION       TIER         SIZE_GB  HOST      PORT  NETWORK       RESERVED_IP  STATUS  CREATE_TIME
memorystore    REDIS_4_0  us-central1  STANDARD_HA  1        10.0.0.4  6379  default-auto  10.0.0.0/29  READY   2019-07-15T11:43:14

我的数据流作业具有以下属性：

runner: org.apache.beam.runners.dataflow.DataflowRunner
zone: us-central1-b
network: default-auto
> gcloud dataflow jobs list   
JOB_ID                                    NAME                        TYPE       CREATION_TIME        STATE      REGION
2019-06-17_02_01_36-3308621933676080017   eventflow                   Streaming  2019-06-17 09:01:37  Running    us-central1

我的“默认”网络无法使用，因为它是 Memorystore 不接受的遗留网络。我未能找到将默认网络从旧版升级到自动的方法，并且不想删除现有的默认网络，因为这需要与生产服务混淆。相反，我创建了一个 auto 类型的新网络“default-auto”，其防火墙规则与默认网络相同。我认为与我的 Dataflow 工作相关的是：

Name: default-auto-internal
Type: Ingress
Targets: Apply to all   
Filters: IP ranges: 10.0.0.0/20
Protocols/ports: 
  tcp:0-65535
  udp:0-65535
  icmp
Action: Allow
Priority: 65534

我可以使用 Compute Engine 实例中的“telnet 10.0.0.4 6379”连接到 Memorystore。

我尝试过的事情并没有改变任何东西： - 将 Redis 库从 Jedis 2.9.3 切换到 Lettuce 5.1.7 - 删除并重新创建了 Memorystore 实例

Dataflow 不应该能够连接到 Memorystore，还是我遗漏了什么？

score 3 · Accepted Answer

弄清楚了。我试图从直接从我的 Dataflow 作业的主要方法调用的代码连接到 Memorystore。从 Dataflow 步骤中运行的代码连接有效。不过在第二个（嗯，实际上更像是第 1002 个想法），这是有道理的，因为 main() 在驱动程序机器（在这种情况下是我的桌面）上运行，而数据流图的步骤将在 GCP 上运行。我通过在我的 main() 中连接到 localhost:6379 上的 Memorystore 证实了这个理论。这很有效，因为我在端口 6379 上运行了一个到 Memorystore 的 SSH 隧道（使用这个技巧）。

google-cloud-dataflow - 无法从 GCP Dataflow 连接到 GCP Memorystore

1 回答 1

Related

Reference