java - C中的简单数据序列化

Question

我目前正在重新设计一个应用程序，并偶然发现了序列化某些数据的问题。

假设我有一个大小为 mxn 的数组

double **data;

我想序列化成一个

char *dataSerialized

使用简单的分隔符（一个用于行，一个用于元素）。

反序列化相当简单，计算分隔符并为要存储的数据分配大小。但是，序列化函数呢，比如说

serialize_matrix(double **data, int m, int n, char **dataSerialized);

确定 char 数组所需大小并为其分配适当内存的最佳策略是什么？

也许在字符串中使用一些固定宽度的双精度指数表示？是否可以将所有字节的 double 转换为 char 并具有 sizeof(double) 对齐的 char 数组？我将如何保持数字的准确性不变？

笔记：

我需要 char 数组中的数据，而不是二进制文件，而不是文件中的数据。

序列化的数据将使用 ZeroMQ 在 C 服务器和 Java 客户端之间通过网络发送。考虑到数组尺寸和 sizeof(double) 是否可以始终在这两者之间准确重建？

score 3 · Accepted Answer

Java 对读取原始字节并转换为您想要的任何内容都有很好的支持。您可以决定一个简单的有线格式，然后在 C 中序列化为此，并在 Java 中反序列化。

这是一个非常简单格式的示例，其中包含用于反序列化和序列化的代码。

我写了一个稍微大一点的测试程序，如果你愿意，我可以把它转储到某个地方；它在 C 中创建一个随机数据数组，序列化，将序列化的字符串 base64 编码写入标准输出。然后，更小的 java 程序读取、解码和反序列化它。

序列化的C代码：

/* 
I'm using this format:
32 bit signed int                   32 bit signed int                   See below
[number of elements in outer array] [number of elements in inner array] [elements]

[elements] is buildt like
[element(0,0)][element(0,1)]...[element(0,y)][element(1,0)]...

each element is sendt like a 64 bit iee754 "double". If your C compiler/architecture is doing something different with its "double"'s, look forward to hours of fun :)

I'm using a couple non-standard functions for byte-swapping here, originally from a BSD, but present in glibc>=2.9.
*/

/* Calculate the bytes required to store a message of x*y doubles */
size_t calculate_size(size_t x, size_t y)
{
    /* The two dimensions in the array  - each in 32 bits - (2 * 4)*/
    size_t sz = 8;  
    /* a 64 bit IEE754 is by definition 8 bytes long :) */
    sz += ((x * y) * 8);    
    /* and a NUL */
    sz++;
    return sz;
}

/* Helpers */
static char* write_int32(int32_t, char*);
static char* write_double(double, char*);
/* Actual conversion. That wasn't so hard, was it? */
void convert_data(double** src, size_t x, size_t y, char* dst)
{

    dst = write_int32((int32_t) x, dst);    
    dst = write_int32((int32_t) y, dst);    

    for(int i = 0; i < x; i++) {
        for(int j = 0; j < y; j++) {
            dst = write_double(src[i][j], dst);
        }
    }
    *dst = '\0';
}


static char* write_int32(int32_t num,  char* c)
{
    char* byte; 
    int i = sizeof(int32_t); 
    /* Convert to network byte order */
    num = htobe32(num);
    byte = (char*) (&num);
    while(i--) {
        *c++ = *byte++;
    }
    return c;
}

static char* write_double(double d, char* c)
{
    /* Here I'm assuming your C programs use IEE754 'double' precision natively.
    If you don't, you should be able to convert into this format. A helper library most likely already exists for your platform.
    Note that IEE754 endianess isn't defined, but in practice, normal platforms use the same byte order as they do for integers.
*/
    char* byte; 
    int i = sizeof(uint64_t);
    uint64_t num = *((uint64_t*)&d);
    /* convert to network byte order */
    num = htobe64(num);
    byte = (char*) (&num);
    while(i--) {
        *c++ = *byte++; 
    }
    return c;
}

要反序列化的 Java 代码：

/* The raw char array from c is now read into the byte[] `bytes` in java */
DataInputStream stream = new DataInputStream(new ByteArrayInputStream(bytes));

int dim_x; int dim_y;
double[][] data;

try {   
    dim_x = stream.readInt();
    dim_y = stream.readInt();
    data = new double[dim_x][dim_y];
    for(int i = 0; i < dim_x; ++i) {
        for(int j = 0; j < dim_y; ++j) {
            data[i][j] = stream.readDouble();
        }
    }

    System.out.println("Client:");
    System.out.println("Dimensions: "+dim_x+" x "+dim_y);
    System.out.println("Data:");
    for(int i = 0; i < dim_x; ++i) {
        for(int j = 0; j < dim_y; ++j) {
            System.out.print(" "+data[i][j]);
        }
        System.out.println();
    }


} catch(IOException e) {
    System.err.println("Error reading input");
    System.err.println(e.getMessage());
    System.exit(1);
}

score 1 · Accepted Answer

如果你正在编写一个二进制文件，你应该想一个好方法来序列化你的double. 这可以从直接将双精度的内容写入文件（注意字节顺序）到一些更精细的规范化序列化方案（例如，使用明确定义的 NaN 表示）。这真的取决于你。如果您希望基本上属于同构架构，那么直接内存转储可能就足够了。

如果您想写入文本文件并且正在寻找 ASCII 表示，我强烈反对使用十进制数字表示。相反，您可以使用 base64 或类似的东西将 64 位原始数据转换为 ASCII。

你真的想保持你的所有精度double！

java - C中的简单数据序列化

2 回答 2

Related

Reference