c# - Improving protobuf-net serialisation times for large'ish' responses

Question

I'm implementing a wcf request/response service using Protobuf-net for serialisation and tcp bindings. In running a test with both server and client running on the same machine and am seeing, for around a 1.5Mb response object, a rountrip time of around 500ms.

When I serialise the same object to a memory stream, after recieving the response on the client, it takes around 115ms, and derserialisation, around 330.ms
This kind of adds up with overall round trip time considering the overhead for querying the data from the db etc.

I've seen it written that this is probably at the message size limit that proto buffers should be used for but is this the kind of serialisation/deserisation times I can expect? Does protobuf-net have any size/speed trade off options? Thanks

Here's the model currently...

public class BaseResponse
{
    public bool Success {get;set;}
    public string Error {get;set;}
}

public class SourceTableResponse : BaseResponse
{
   public Dictionary<string, Dictionary<string,string>> FieldValuesByTableName {get;set;}
}

score 4 · Accepted Answer

有一些技巧可以在这里提供帮助，是的。其中最常见的是尽可能多地使用“分组”数据。解释一下：“组”是 protobuf 规范的一个特性，谷歌使用得不多——他们建议默认为子对象使用长度前缀表示法——但长度前缀编写起来相对昂贵。在大多数情况下，这就像添加DataFormat = DataFormat.Group一些注释一样简单，但是：当你有Dictionary<string,Dictionary<string,string>>- 因为KeyValuePair<,>受到protobuf-net保护时，这并不那么简单，以防止它被好心的用户破坏：它不允许您更改格式。我们仍然可以这样做，但是：我们需要编写自己的模型而不是使用裸字典——有点痛苦。

其他技巧：

在原语列表上使用“打包”编码 - 不适用于此处；你没有
利用实习来处理大量重复的字符串值 - 查看您的模型，我猜内部字典的键是字段名称，因此可能会在表之间重复使用很多次；Dictionary<string,string>我们可以尝试再次实习，但不是我们可以做的事情

但从根本上说：您的数据目前将由 UTF-8主导- 无论是在存储方面还是在处理方面。我对此无能为力，因为您将所有内容都存储为字符串。就个人而言，我会说这种非常松散的模型并不适合充分利用 protobuf-net。我所能做的就是在极限内尽可能地拧紧它。例如，这仅适用于转发（无缓冲）：

[ProtoContract]
[ProtoInclude(3, typeof(CustomSourceTableResponse), DataFormat = DataFormat.Group)]
public class CustomBaseResponse
{
    [ProtoMember(1)]
    public bool Success { get; set; }
    [ProtoMember(2)]
    public string Error { get; set; }
}
[ProtoContract]
public class CustomSourceTableResponse : CustomBaseResponse
{
    [ProtoMember(1, DataFormat = DataFormat.Group)]
    public List<FieldTable> FieldValuesByTableName { get { return fieldValuesByTableName; } }
    private readonly List<FieldTable> fieldValuesByTableName = new List<FieldTable>();
}
[ProtoContract]
public class FieldTable
{
    public FieldTable() { }
    public FieldTable(string tableName)
    {
        TableName = tableName;
    }
    [ProtoMember(1)]
    public string TableName { get; set; }
    [ProtoMember(2, DataFormat = DataFormat.Group)]
    public List<FieldValue> FieldValues { get { return fieldValues; } }
    private readonly List<FieldValue> fieldValues = new List<FieldValue>();
}
[ProtoContract]
public class FieldValue
{
    public FieldValue() { }
    public FieldValue(string name, string value)
    {
        Name = name;
        Value = value;
    }
    [ProtoMember(1)]
    public string Name { get; set; }
    [ProtoMember(2)]
    public string Value { get; set; }
}

如果您期望有很多重复值FieldValue.Name（例如，有很多行，并且每一行都有相同的字段），那么......好吧，坦率地说，我建议使用适当的基于类型的模型，即

class SomeRow {
    public int Id {get;set;}
    public string Name {get;set;}
    public DateTime DateOfBirth {get;set;}
}

但如果那是不可能的，那么我想你仍然可以避免"DateOfBirth"在数据中出现 200 次：

[ProtoMember(1, AsReference=true)]
public string Name { get; set; }
[ProtoMember(2)]
public string Value { get; set; }

但请注意：这确实比使用类型化模型要贵得多，原因有很多：

所有这些文字都很昂贵
需要存储名称（protobuf 的部分优点在于它完全避免存储名称）
所有这些文字都很昂贵（是的，我已经说过了——但这真的很重要）

c# - Improving protobuf-net serialisation times for large'ish' responses

1 回答 1

Related

Reference