java - 如何使用字节缓冲区序列化字节数组以开始遵循大端格式？

Question

我需要Byte Array使用 Java 代码将值写入 Cassandra。然后我将拥有我的 C++ 程序，该程序将从 Cassandra 检索该字节数组数据，然后将其反序列化。

我将写入 Cassandra 的字节数组由三个字节数组组成，如下所述 -

short employeeId = 32767;
long lastModifiedDate = "1379811105109L";
byte[] attributeValue = os.toByteArray();

现在，我将employeeId,lastModifiedDate和attributeValue一起写入单个字节数组，然后将生成的字节数组写入 Cassandra，然后我将拥有我的 C++ 程序，该程序将从 Cassandra 检索该字节数组数据，然后将其反序列化以提取employeeId,lastModifiedDate并attributeValue从中提取它。

我不确定在写入 Cassandra 时是否应该在我的 Java 代码中使用 Big Endian，以便 C++ 代码在读回时得到简化？

我在 Java 端进行了尝试，以确保它遵循某种格式（Big Endian），同时将所有内容写入单个字节数组，然后这个字节数组也将被写回 Cassandra，但不确定这是否正确或不是？

public static void main(String[] args) throws Exception {

    String os = "Byte Array Test";
    byte[] attributeValue = os.getBytes();

    long lastModifiedDate = 1379811105109L;
    short employeeId = 32767;

    ByteArrayOutputStream byteOsTest = new ByteArrayOutputStream();
    DataOutputStream outTest = new DataOutputStream(byteOsTest);

    // merging everything into one Byte Array here
    outTest.writeShort(employeeId);
    outTest.writeLong(lastModifiedDate);
    outTest.writeInt(attributeValue.length);
    outTest.write(attributeValue);

    byte[] allWrittenBytesTest = byteOsTest.toByteArray();

    // initially I was writing allWrittenBytesTest into Cassandra...

    ByteBuffer bb = ByteBuffer.wrap(allWrittenBytesTest).order(ByteOrder.BIG_ENDIAN);

    // now what value I should write into Cassandra?
    // or does this even looks right?

    // And now how to deserialize it?

}

任何人都可以在这里帮我解决这个 ByteBuffer 的问题吗？谢谢..

我可能在这里遗漏了有关字节缓冲区的详细信息，因为这是我第一次使用它。

首先，我应该在我的用例中使用 ByteByffer 吗？
其次，如果是，那么在我的用例中使用它的最佳方式是什么......？

我唯一要确保的是，我通过遵循 Big-Endians 字节顺序格式正确写入 Cassandra，这样在 C++ 端，我在反序列化该字节数组时根本不会遇到任何问题......

score 3 · Accepted Answer

不要手动为 Thrift 序列化 ByteBuffers，而是使用 Cassandra 的本地 CQL 驱动程序：http: //github.com/datastax/java-driver

score 1 · Accepted Answer

首先，我从未使用过 cassandra，我只会回答 ByteBuffer 部分。

您应该在发送字节之前先将所有内容放入字节缓冲区，否则您无法控制所存储内容的字节序，这正是使用 ByteBuffer 的重点。

要发送字节，请使用：

int size = 2 + 8 + 4 + attributeValue.length; // short is 2 bytes, long 8 and int 4

ByteBuffer bbuf = ByteBuffer.allocate(size); 
bbuf.order(ByteOrder.BIG_ENDIAN);

bbuf.putShort(employeeId);
bbuf.putLong(lastModifiedDate);
bbuf.putInt(attributeValue.length);
bbuf.put(attributeValue);

bbuf.rewind();

// this is a bad approach because if you modify the returned array
// you are directly modifying the ByteBuffer's internal array.
byte[] bytesToStore = bbuf.array();

// best approach is copy the internal buffer
byte[] bytesToStore = new byte[size];
bbuf.get(bytesToStore);

现在您可以存储 bytesToStore，将它们发送到 cassandra。

要读回它们：

byte[] allWrittenBytesTest = magicFunctionToAcquireDataFromCassandra();

ByteBuffer bb = ByteBuffer.wrap(allWrittenBytesTest);
bb.order(ByteOrder.BIG_ENDIAN);
bb.rewind();

int size = allWrittenBytesTest.length - 14;
short employeeId = bb.getShort();
long lastModifiedDate = bb.getLong();
int attributeValueLen = bb.getInt();
byte[] attributeValue = new byte[size];
bb.get(attributeValue); // read attributeValue from the remaining buffer

您甚至不需要存储属性值长度，因为可以通过从 allWrittenBytesTest.length 中减去 14 来再次确定长度（即 14 是其他字段大小 [2 + 4 + 8] 的总和）。

编辑代码，我有一些错别字。

score 1 · Accepted Answer

对于字节数组 endiness 根本没有意义。因此，如果 casandra 不尝试解释您的数据，您可以使用大端/小端。所以编码只对多字节值有意义。

如果您打算将数据用于不同的客户端并且可能在不同的平台上使用，我建议您采取一些协议（例如使用 BIG endian）并在所有客户端上使用相同的 endiness。例如，java 客户端代码如下所示：

ByteBuffer bb = ByteBuffer.allocate(attributeValue.length + 14).order(ByteOrder.BIG_ENDIAN);
    bb.putShort(employeeId);
    bb.putLong(lastModifiedDate);
    bb.putInt(attributeValue.length);
    bb.put(attributeValue);

如果要使用需要它的 API，则必须使用 ByteBuffer。例如，NIO 通道与 ByteBuffers 一起使用，因此如果您要使用 SocketChannel 进行连接，则可以使用 ByteBuffer。您还可以使用 ByteBuffer 来正确格式化您的多字节值。例如，对于上面的代码，您可以从缓冲区中获取字节数组并通过套接字发送，其中 3 个第一个字段使用大端表示法打包：

sendByteArray(bb.array());
...

java - 如何使用字节缓冲区序列化字节数组以开始遵循大端格式？

3 回答 3

Related

Reference