1

目前,我们在 AWS EC2 实例上的系统(客户端)在尝试将大文件发送到客户远程 sftp 服务器(“160.xxx.xxx.35.bc.googleusercontent.com)时遇到了 sftp 问题。对于小文件,sftp 传输工作正常,但是当文件大小大约或大于 1GB 时,我们发现只有 1068392448 字节传输到服务器 sftp 站点。但是,当我们将具有相同代码和相同环境的相同大文件发送到我们自己的非 googleusercontent 远程 sftp 服务器(只有 URL/用户名/密码不同)时,它是成功的,并且所有数据都正确传输。

这个问题是在客户服务器端通过添加负载均衡器进程进行一些更改后发生的。客户服务器端进行了一些调查,调整了负载均衡器超时,但无助于解决此问题。据说客户端在 1068392448 字节后停止数据传输,服务器端在允许的空闲时间(~50 秒)后等待并断开连接。

我们的调查注意到大源文件是从 AWS S3 读取并正确保存到本地的。当大文件数据写入服务器 sftp 站点达到1068392448 字节(所有测试结果一致)时,在服务器允许空闲时间(约 50 秒)后,TCP 套接字连接状态从 ESTABLISHED 变为 CLOSE_WAIT。该进程永远保持这种状态,直到它被手动停止/杀死。当 TCP 套接字连接处于 CLOSE_WAIT 状态时,dump 中显示的数据传输过程在 awaitSpace() 方法(在 java.io 包的 PipedInputStream 类中)处于等待循环中。缓冲区被指示为已满并等待写入服务器端。下面是等待循环的代码:

private void checkStateForReceive() throws IOException {
    if (!connected) {
        throw new IOException("Pipe not connected");
    } else if (closedByWriter || closedByReader) {
        throw new IOException("Pipe closed");
    } else if (readSide != null && !readSide.isAlive()) {
        throw new IOException("Read end dead");
    }
}
private void awaitSpace() throws IOException {
    while (in == out) {
        checkStateForReceive();

        /* full: kick any waiting readers */
        notifyAll();
        try {
            wait(1000);
        } catch (InterruptedException ex) {
            throw new java.io.InterruptedIOException();
        }
    }
}

下面是转储:

    Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.102-b14 mixed mode):

"Connect thread files.liveramp.com session" #13 daemon prio=5 os_prio=0 tid=0x000000001e711800 nid=0x63c0 in Object.wait() [0x000000001f81f000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.io.PipedInputStream.awaitSpace(PipedInputStream.java:273)
    at java.io.PipedInputStream.receive(PipedInputStream.java:231)
    - locked <0x00000006c1c944c8> (a com.jcraft.jsch.Channel$MyPipedInputStream)
    at java.io.PipedOutputStream.write(PipedOutputStream.java:149)
    at com.jcraft.jsch.IO.put(IO.java:64)
    at com.jcraft.jsch.Channel.write(Channel.java:438)
    at com.jcraft.jsch.Session.run(Session.java:1459)
    at java.lang.Thread.run(Thread.java:745)

"org.apache.commons.vfs2.cache.SoftRefFilesCache$SoftRefReleaseThread" #11 daemon prio=5 os_prio=0 tid=0x000000001ea2d000 nid=0x3078 in Object.wait() [0x000000001efcf000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
    - locked <0x00000006c1c24710> (a java.lang.ref.ReferenceQueue$Lock)
    at org.apache.commons.vfs2.cache.SoftRefFilesCache$SoftRefReleaseThread.run(SoftRefFilesCache.java:74)

"Service Thread" #10 daemon prio=9 os_prio=0 tid=0x000000001dfc5000 nid=0x1944 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread2" #9 daemon prio=9 os_prio=2 tid=0x000000001df12800 nid=0x6848 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" #8 daemon prio=9 os_prio=2 tid=0x000000001df07800 nid=0x5720 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #7 daemon prio=9 os_prio=2 tid=0x000000001df04800 nid=0x6358 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Monitor Ctrl-Break" #6 daemon prio=5 os_prio=0 tid=0x000000001df56000 nid=0x6910 runnable [0x000000001e1cf000]
   java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:170)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
    - locked <0x00000006c1c24f88> (a java.io.InputStreamReader)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at java.io.BufferedReader.fill(BufferedReader.java:161)
    at java.io.BufferedReader.readLine(BufferedReader.java:324)
    - locked <0x00000006c1c24f88> (a java.io.InputStreamReader)
    at java.io.BufferedReader.readLine(BufferedReader.java:389)
    at com.intellij.rt.execution.application.AppMainV2$1.run(AppMainV2.java:64)

"Attach Listener" #5 daemon prio=5 os_prio=2 tid=0x000000001c69e800 nid=0x4880 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" #4 daemon prio=9 os_prio=2 tid=0x000000001c69d800 nid=0x2534 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" #3 daemon prio=8 os_prio=1 tid=0x000000001c67e800 nid=0x4908 in Object.wait() [0x000000001d9df000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000006c1c26660> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
    - locked <0x00000006c1c26660> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler" #2 daemon prio=10 os_prio=2 tid=0x00000000030b5000 nid=0x6340 in Object.wait() [0x000000001d8df000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000006c1c26818> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:502)
    at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
    - locked <0x00000006c1c26818> (a java.lang.ref.Reference$Lock)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

"main" #1 prio=5 os_prio=0 tid=0x0000000002fc4000 nid=0x56dc in Object.wait() [0x0000000002e0f000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at com.jcraft.jsch.Session.write(Session.java:1269)
    - locked <0x00000006c1c943d0> (a com.jcraft.jsch.ChannelSftp)
    at com.jcraft.jsch.ChannelSftp.sendWRITE(ChannelSftp.java:2646)
    at com.jcraft.jsch.ChannelSftp.access$100(ChannelSftp.java:36)
    at com.jcraft.jsch.ChannelSftp$1.write(ChannelSftp.java:806)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
    - eliminated <0x00000006c1c27a88> (a org.apache.commons.vfs2.provider.sftp.SftpFileObject$SftpOutputStream)
    at org.apache.commons.vfs2.util.MonitorOutputStream.write(MonitorOutputStream.java:123)
    - locked <0x00000006c1c27a88> (a org.apache.commons.vfs2.provider.sftp.SftpFileObject$SftpOutputStream)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
    - eliminated <0x00000006c1c26a50> (a org.apache.commons.vfs2.provider.DefaultFileContent$FileContentOutputStream)
    at org.apache.commons.vfs2.util.MonitorOutputStream.write(MonitorOutputStream.java:123)
    - locked <0x00000006c1c26a50> (a org.apache.commons.vfs2.provider.DefaultFileContent$FileContentOutputStream)
    at org.apache.commons.vfs2.provider.DefaultFileContent.write(DefaultFileContent.java:805)
    at org.apache.commons.vfs2.provider.DefaultFileContent.write(DefaultFileContent.java:784)
    at org.apache.commons.vfs2.provider.DefaultFileContent.write(DefaultFileContent.java:755)
    at org.apache.commons.vfs2.provider.DefaultFileContent.write(DefaultFileContent.java:771)
    at org.apache.commons.vfs2.FileUtil.copyContent(FileUtil.java:37)
    at org.apache.commons.vfs2.provider.AbstractFileObject.copyFrom(AbstractFileObject.java:295)
    at com.merkleinc.dat.sftptran.cli.SftpVfs2App.main(SftpVfs2App.java:88)

"VM Thread" os_prio=2 tid=0x000000001c657000 nid=0x28b4 runnable 

"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x0000000002fda800 nid=0x2b54 runnable 

"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x0000000002fdc000 nid=0x4528 runnable 

"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x0000000002fdd800 nid=0x4850 runnable 

"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x0000000002fdf800 nid=0x7f0 runnable 

"VM Periodic Task Thread" os_prio=2 tid=0x000000001e023000 nid=0x5d58 waiting on condition 

JNI global references: 225

Heap
 PSYoungGen      total 49664K, used 14491K [0x000000076b400000, 0x000000076ed00000, 0x00000007c0000000)
  eden space 49152K, 29% used [0x000000076b400000,0x000000076c21efd0,0x000000076e400000)
  from space 512K, 6% used [0x000000076e500000,0x000000076e508000,0x000000076e580000)
  to   space 4608K, 0% used [0x000000076e880000,0x000000076e880000,0x000000076ed00000)
 ParOldGen       total 175104K, used 2879K [0x00000006c1c00000, 0x00000006cc700000, 0x000000076b400000)
  object space 175104K, 1% used [0x00000006c1c00000,0x00000006c1ecfc70,0x00000006cc700000)
 Metaspace       used 11230K, capacity 11448K, committed 11648K, reserved 1058816K
  class space    used 1265K, capacity 1318K, committed 1408K, reserved 1048576K

目前,我们正在使用 com.github 的 vfs-s3 版本 2.4.2 和相关的 apache.commons vfs2 版本 2.1。和 com.jcraft 的 jsch 版本 0.1.54 用于 sftp 数据传输。我们的基本代码如下:

import org.apache.commons.vfs2.*;
Public void upload(String sourceUri, String targetUri) throws IOException, URISyntaxException {
StandardFileSystemManager fsManager = new StandardFileSystemManager();
fsManager.init();
try (FileObject sourceFile = getFileObject(sourceUri, fsManager);
FileObject remoteFile = fsManager.resolveFile(createConnectionString(targetUri), createOptions())) {
// Copy s3 file to sftp server
remoteFile.copyFrom(sourceFile, Selectors.SELECT_SELF);
} finally {
fsManager.close();
}
};

我们尝试在本地 Window 环境中测试相同的代码。有时我们会看到与 ec2 实例中相同的问题症状,但大多数时候问题不存在(不一致)。我们在其他 linux 系统上测试,将大文件发送到客户服务器 sftp 站点,没有问题。

我们尝试在 AWS EC2 实例上升级到 com.github.abashev 的 vfs-s3 版本 3.0.0 和 4.0.0 以及相应版本的 vfs2。但是,它为相同的大文件重现了相同的问题结果。尝试使用正确的配置文件值将 loadOpenSSHConfig 设置为 true 以保持连接处于活动状态,但这对这种情况没有帮助。

我们尝试通过 sftp "put" 命令直接测试 sftp,将大文件从 EC2 实例(和其他 Windows/Linux 平台)发送到远程服务器,并且大文件数据传输始终成功。

问题是问题的潜在根本原因在哪里?为什么服务器端在 1068392448 字节后停止接收数据,或者为什么其余数据无法发送到服务器端?我们的 EC2 环境有任何硬性限制阻止了数据传输操作(我们尝试检查一些限制但仍不清楚)?或者服务器站点错误地向客户端站点发送了连接关闭“FIN”请求(如何证明)?感谢您对潜在解决方案的任何建议。

4

0 回答 0