c# - 如何检测协议缓冲区消息何时被完全接收？

Question

这是我的另一个问题的一个分支。如果您愿意，请阅读它，但这不是必需的。

基本上，我意识到，为了在大型消息上有效地使用 C# 的 BeginReceive()，我需要 (a) 先读取数据包长度，然后准确读取那么多字节，或者 (b) 使用数据包结尾分隔符。我的问题是，这些中的任何一个都存在于协议缓冲区中吗？我还没有使用它们，但是查看文档似乎没有长度标题或分隔符。

如果没有，我该怎么办？我应该只构建消息然后使用长度标头/EOP 分隔符为其添加前缀/后缀吗？

score 15 · Accepted Answer

您需要在协议中包含大小或结束标记。基于流的套接字 (TCP/IP) 除了支持任意分解成单独数据包的不确定八位位组流（并且数据包在传输过程中也可能溢出）外，没有内置任何内容。

一个简单的方法是让每个“消息”都有一个固定大小的标头，包括协议版本和有效负载大小以及任何其他固定数据。然后是消息内容（有效载荷）。

可选地，可以添加带有校验和甚至加密签名的消息页脚（固定大小）（取决于您的可靠性/安全性要求）。

知道有效负载大小可以让您继续读取足够用于其余消息的字节数（如果读取完成时较少，则对剩余字节进行另一次读取，直到收到整个消息）。

有一个结束消息指示器也可以，但是您需要定义如何处理包含相同八位字节序列的消息...

score 6 · Accepted Answer

Apologies for arriving late at the party. I am the author of protobuf-net, one of the C# implementations. For network usage, you should consider the "[De]SerializeWithLengthPrefix" methods - that way, it will automatically handle the lengths for you. There are examples in the source.

I won't go into huge detail on an old post, but if you want to know more, add a comment and I'll get back to you.

score 3 · Accepted Answer

I agree with Matt that a header is better than a footer for Protocol Buffers, for the primary reason that as PB is a binary protocol it's problematic to come up with a footer that would not also be a valid message sequence. A lot of footer-based protocols (typically EOL ones) work because the message content is in a defined range (typically 0x20 - 0x7F ASCII).

A useful approach is to have your lowest level code just read buffers off of the socket and present them up to a framing layer that assembles complete messages and remembers partial ones (I present an async approach to this (using the CCR) here, albeit for a line protocol).

For consistency, you could always define your message as a PB message with three fields: a fixed-int as the length, an enum as the type, and a byte sequence that contains the actual data. This keeps your entire network protocol transparent.

score 1 · Accepted Answer

TCP/IP 以及 UDP 数据包包含一些对其大小的参考。IP 报头包含一个 16 位字段，用于指定 IP 报头和数据的长度（以字节为单位）。TCP 标头包含一个 4 位字段，以 32 位字指定 TCP 标头的大小。UDP 标头包含一个 16 位字段，用于指定 UDP 标头和数据的长度（以字节为单位）。

事情就是这样。

Using the standard run-of-the-mill sockets in Windows, whether you're using the System.Net.Sockets namespace in C# or the native Winsock stuff in Win32, you never see the IP/TCP/UDP headers. These headers are stripped off so that what you get when you read the socket is the actual payload, i.e., the data that was sent.

The typical pattern from everything I've ever seen and done using sockets is that you define an application-level header that precedes the data you want to send. At a minimum, this header should include the size of the data to follow. This will allow you to read each "message" in its entirety without having to guess as to its size. You can get as fancy as you want with it, e.g., sync patterns, CRCs, version, type of message, etc., but the size of the "message" is all you really need.

And for what it's worth, I would suggest using a header instead of an end-of-packet delimiter. I'm not sure if there is a signficant disadvantage to the EOP delimiter, but the header is the approach used by most IP protocols I've seen. In addition, it just seems more intuitive to me to process a message from the beginning rather than wait for some pattern to appear in my stream to indicate that my message is complete.

EDIT: I have only just become aware of the Google Protocol Buffers project. From what I can tell, it is a binary serialization/de-serialization scheme for WCF (I'm sure that's a gross oversimplification). If you are using WCF, you don't have to worry about the size of the messages being sent because the WCF plumbing takes care of this behind the scenes, which is probably why you haven't found anything related to message length in the Protocol Buffers documentation. However, in the case of sockets, knowing the size will help out tremendously as discussed above. My guess is that you will serialize your data using the Protocol Buffers and then tack on whatever application header you come up with before sending it. On the receive side, you'll pull off the header and then de-serialize the remainder of the message.

c# - 如何检测协议缓冲区消息何时被完全接收？

4 回答 4

Related

Reference