0

我突然在从 Oracle do Neo4j 迁移的 ETL 管道中遇到奇怪的错误。

ETL 实现为 3 个图像的 docker-compose:

  • Pentaho PDI
  • 源 Oracle 映像
  • 目标 Neo4j 图像

PDI 中的主要管道从 Oracle 加载数据,将它们转换为 CSV 并存储到 Neo4j 中,这些文件在其中进一步处理。从某个时刻开始,包含 CSV 文件的 zip 的 sftp 传输失败并出现以下错误:

2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip - Started FTP job to ${remote_server}
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip - ERROR (version 8.3.0.0-371, build 8.3.0.0-371 from 2019-06-11 11.09.08 by buildguy) : Error getting files from FTP : There was a problem while connecting to neo4j:22
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip - ERROR (version 8.3.0.0-371, build 8.3.0.0-371 from 2019-06-11 11.09.08 by buildguy) : java.io.IOException: There was a problem while connecting to neo4j:22
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at com.trilead.ssh2.Connection.connect(Connection.java:791)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at com.trilead.ssh2.Connection.connect(Connection.java:563)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.entries.ftpdelete.JobEntryFTPDelete.SSHConnect(JobEntryFTPDelete.java:966)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.entries.ftpdelete.JobEntryFTPDelete.execute(JobEntryFTPDelete.java:746)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.Job.execute(Job.java:686)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.Job.execute(Job.java:827)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.Job.execute(Job.java:827)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.Job.execute(Job.java:827)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.Job.execute(Job.java:827)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.Job.execute(Job.java:827)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.Job.execute(Job.java:827)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.Job.execute(Job.java:827)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.Job.execute(Job.java:565)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at org.pentaho.di.job.entries.job.JobEntryJobRunner.run(JobEntryJobRunner.java:69)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at java.lang.Thread.run(Thread.java:748)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip - Caused by: java.io.IOException: Key exchange was not finished, connection is closed.
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at com.trilead.ssh2.transport.KexManager.getOrWaitForConnectionInfo(KexManager.java:92)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at com.trilead.ssh2.transport.TransportManager.getConnectionInfo(TransportManager.java:230)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at com.trilead.ssh2.Connection.connect(Connection.java:743)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  ... 14 more
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip - Caused by: java.io.IOException: Cannot negotiate, proposals do not match.
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at com.trilead.ssh2.transport.KexManager.handleMessage(KexManager.java:413)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:754)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:469)
2021/08/30 12:21:51 - Cleanup remote deko-etl-import.zip -  ... 1 more

该错误很难用谷歌搜索 - 有几个类似的问题(12345)但它很难解释突然故障的原因。

我觉得这与 ssh 密钥交换有关,但我对 ssh 的了解不够深入,无法理解发生了什么。

4

1 回答 1

0

我的同事后来注意到 neo4j:4.2.3 镜像被重新推送,并且它的新版本是基于 Debian Bullseye 构建的。通过与尚未拉取新版本neo4j的计算机进行比较,我们意识到openssh的版本已从7.9p1-10更改为8.4p1-5。( dpkg --list | grep openssh)。然后我们可以轻松地在本地重现该错误并证明 PDI 对旧的 Neo4j 图像有效,但对新图像无效。

一种选择是调整更新的 Neo4j 映像并强制将 openssh 降级到以前的版本。这可能会奏效,但是如果出现任何问题,它将关闭任何升级、补丁和有限的操作空间。因此,我们决定正确的解决方案是升级客户端。

我们的 Pentaho PDI 版本(由客户指定)使用trilead-ssh2 库 build 213。不幸的是,使用较新的版本(尝试了 217 和最新的 222)它也失败了。通过Jenkins fork build 217替换库使 ssh 通信最终再次正常工作。似乎成功的关键部分是pull request #60,它添加了新的 KEX 算法。fork 需要两个依赖项(中央 maven 仓库中的eddsa和 jbcrypt,我在中央仓库和 Spring 仓库中都找不到,但可以在这里找到),它们也必须复制到 Pentaho PDIdata-integration/lib目录中。

于 2021-09-01T13:31:58.367 回答