2

试图解决似乎与套接字刷新行为有关的传出消息存在大量延迟的问题。我一直在对从 quickfixj 发起方到接收方的传出 FIX 消息进行数据包捕获。

总结一下环境,java intiator 与另一台服务器上的服务器套接字建立套接字连接。两台服务器都运行 Redhat Enterprise Linux 5.10。来自接口上的 netstat 的 MSS 为 0。网卡的 MTU 都是 1500(我相信对于环回接口是无限的)。在应用程序端,消息由 quickfixj 编码为字节数组并写入套接字。套接字配置为启用 TCP_NODELAY。

我几乎可以肯定我可以消除应用程序作为延迟的原因,因为当接受器(ServerSocket)与使用环回接口的发起者在同一台服务器上运行时,没有发送者延迟。这是使用环回接口的一些数据包捕获条目的示例:

"No.","Time","Source","Destination","Protocol","Length","SendingTime (52)","MsgSeqNum (34)","Destination Port","Info","RelativeTime","Delta","Push"
"0.001606","10:23:29.223638","127.0.0.1","127.0.0.1","FIX","1224","20150527-09:23:29.223","5360","6082","MarketDataSnapshotFullRefresh","0.001606","0.000029","Set"
"0.001800","10:23:29.223832","127.0.0.1","127.0.0.1","FIX","1224","20150527-09:23:29.223","5361","6082","MarketDataSnapshotFullRefresh","0.001800","0.000157","Set"
"0.001823","10:23:29.223855","127.0.0.1","127.0.0.1","FIX","1224","20150527-09:23:29.223","5362","6082","MarketDataSnapshotFullRefresh","0.001823","0.000023","Set"
"0.002105","10:23:29.224137","127.0.0.1","127.0.0.1","FIX","825","20150527-09:23:29.223","5363","6082","MarketDataSnapshotFullRefresh","0.002105","0.000282","Set"
"0.002256","10:23:29.224288","127.0.0.1","127.0.0.1","FIX","2851","20150527-09:23:29.224,20150527-09:23:29.224,20150527-09:23:29.224","5364,5365,5366","6082","MarketDataSnapshotFullRefresh","0.002256","0.000014","Set"
"0.002327","10:23:29.224359","127.0.0.1","127.0.0.1","FIX","825","20150527-09:23:29.224","5367","6082","MarketDataSnapshotFullRefresh","0.002327","0.000071","Set"
"0.287124","10:23:29.509156","127.0.0.1","127.0.0.1","FIX","1079","20150527-09:23:29.508","5368","6082","MarketDataSnapshotFullRefresh","0.287124","0.284785","Set"

感兴趣的主要事情是 1/ 尽管数据包长度(这里最大的是 2851),但每个数据包上都设置了 PUSH 标志。2/我在这里测量的延迟度量是消息在编码之前设置的“发送时间”,以及数据包捕获时间“时间”。数据包捕获是在与发送数据的发起者相同的服务器上完成的。对于 10,000 个数据包的数据包捕获,使用 loopback 时,“SendingTime”和“Time”之间没有太大区别。出于这个原因,我认为我可以消除应用程序作为发送延迟的原因。

当接收器移动到 LAN 上的另一台服务器时,对于大于 MTU 大小的数据包,发送延迟开始变得更糟。这是 a 捕获的片段:

"No.","Time","Source","Destination","Protocol","Length","SendingTime (52)","MsgSeqNum (34)","Destination Port","Info","RelativeTime","Delta","Push"
"68.603270","10:35:18.820635","10.XX.33.115","10.XX.33.112","FIX","1223","20150527-09:35:18.820","842","6082","MarketDataSnapshotFullRefresh","68.603270","0.000183","Set"
"68.603510","10:35:18.820875","10.XX.33.115","10.XX.33.112","FIX","1223","20150527-09:35:18.820","843","6082","MarketDataSnapshotFullRefresh","68.603510","0.000240","Set"
"68.638293","10:35:18.855658","10.XX.33.115","10.XX.33.112","FIX","1514","20150527-09:35:18.821","844","6082","MarketDataSnapshotFullRefresh","68.638293","0.000340","Not set"
"68.638344","10:35:18.855709","10.XX.33.115","10.XX.33.112","FIX","1514","20150527-09:35:18.821","845","6082","MarketDataSnapshotFullRefresh","68.638344","0.000051","Not set"

这里重要的是,当数据包小于 MSS(源自 MTU)时,设置 PUSH 标志并且没有发送者延迟。这是意料之中的,因为禁用 Nagle 算法将导致在这些较小的数据包上设置 PUSH。当数据包大小大于 MSS(在这种情况下为 1514 的数据包大小)时,捕获数据包的时间与 SendingTime 之间的差异已跃升至 35 毫秒。

这 35 毫秒的延迟似乎不太可能是由应用程序对消息进行编码引起的,因为大数据包大小的消息在环回接口上的发送时间小于 1 毫秒。捕获也发生在发送方,因此 MTU 分段似乎也不是原因。在我看来,最可能的原因是因为没有设置 PUSH 标志——因为数据包大于 MSS——所以操作系统级别的套接字和/或 TCP 堆栈直到 35 毫秒后才决定刷新它。另一台服务器上的测试接受者不是一个慢消费者,并且在同一个 LAN 上,所以 ACK 是及时的。

任何人都可以就可能导致该套接字发送 > MSS 数据包延迟的原因给出任何指示吗?对于美国的真实交易对手,此发送方延迟高达 300 毫秒。我认为如果数据包大小大于 MSS,那么无论以前的 ACKS(只要不超过套接字缓冲区大小),它都会立即发送。Netstat 通常显示 0 套接字 q 和风大小,并且该问题似乎发生在所有 > MSS 数据包上,即使从启动时也是如此。这看起来像套接字出于某种原因决定不立即刷新,但不确定是什么因素导致了这种情况。

编辑:正如 EJP 所指出的,linux 中没有刷新。据我了解,套接字发送将数据放入 linux 内核的网络缓冲区中。对于这些非推送数据包,内核似乎正在等待前一个数据包的确认,然后再传递它。这不是我所期望的,在 TCP 中,我希望数据包在套接字缓冲区填满之前仍会被传递。

4

1 回答 1

1

这不是一个全面的答案,因为 TCP 行为会因许多因素而异。但在这种情况下,这就是我们面临问题的原因。

在 TCP 拥塞控制实现中,拥塞窗口允许在没有确认的情况下发送越来越多的数据包,只要它没有检测到拥塞迹象,即重传。一般来说,当这些发生时,拥塞算法会重置拥塞窗口,限制可以发送的数据包,然后才能发送 ack。这体现在我们目睹的发送方延迟中,因为数据包被保存在内核缓冲区中,等待对先前数据包的确认。在这方面,没有 TCP_NODELAY、TCP_CORK 等类型的指令会覆盖拥塞控制行为。

在我们的案例中,到另一个场地的往返时间很长,这使情况变得更糟。但是,由于它是一条每天丢包很少的专用线路,因此拥塞控制启动的原因并不是重传。最后,它似乎已通过在 linux 中禁用以下标志来解决。这也会导致拥塞窗口被重置,但通过检测空闲而不是数据包丢失:

"tcp_slow_start_after_idle - BOOLEAN 如果设置,则提供 RFC2861 行为并在空闲期后超时拥塞窗口。空闲期在当前 RTO 中定义。如果未设置,则在空闲期后拥塞窗口不会超时。默认值:1

(请注意,如果您遇到这些问题,也可以研究其他形式的拥塞控制算法,而不是您的内核当前可能设置的那些)。

于 2015-07-02T13:28:50.557 回答