java - 搜索文件以查找编码标签

Question

我有一个文件需要搜索编码标签，并检索它们识别的数据。标签长度为 4 字节，标识可变长度的 ascii 字符串或以 Little-Endian 编码的两字节整数值。

这些标签似乎都在 4 字节边界上，并且都在文件开头的前 2000 个字节内。我尝试了各种搜索文件的方法。唯一有效的是使用十进制整数值逐字节比较。

在 SO 上找到了一个解决方案，但并不完全针对这个问题：indexOfSubList()。我试过这个测试，但结果是-1。

byte[] needle = {68,73,67,77};
byte[] hayStack = {00, 01, 68,73,67,77, 11, 45};
location = Collections.indexOfSubList(Arrays.asList(hayStack), Arrays.asList(needle));

我绝不会拘泥于这段代码，并且会感激任何其他想法或解决方案。

score 2 · Accepted Answer

你的问题有点含糊，你的意思是这样的：

// simplified way of identifying tag by first byte of it,
// make it more complex as needed
byte startOfTag = 65;

// for loop assumes tags start at even 4 byte boundary, if not, modify loop
for(int i = 0; i <= data.length-4 ; i += 4) {
    if (data[i] == startOfTag) {
        myTagHandlerMethod(data[i], data[i+1], data[i+2], data[i+3]);
    }
}

你得到 -1 Collections.indexOfSubList，因为Arrays.asList它不像你期望的那样工作byte[]：它返回List<byte[]>，而不是List<Byte>。容器必须包含对象引用，不允许未装箱的数字类型......这应该有效：

Byte[] needle = {68,73,67,77};
Byte[] hayStack = {00, 01, 68,73,67,77, 11, 45};
location = Collections.indexOfSubList(Arrays.asList(hayStack), Arrays.asList(needle));

如果您想在操作原始类型数组时避免重新发明轮子，您可以使用Google 的 Guava 库。例如，它有一个indexOf您可以在此处使用的方法。

score 1 · Accepted Answer

通过将byte数组转换为Byte数组，您将获得所需的结果：

Byte[] needle = { 68, 73, 67, 77 };
Byte[] hayStack = { 00, 01, 68, 73, 67, 77, 11, 45 };
location = Collections.indexOfSubList(Arrays.asList(hayStack),
    Arrays.asList(needle));

// location now equals 2

这是因为Arrays.asList它不像你想象的那样对byte[]物体进行操作。它返回 aList<byte[]>而不是 a List<Byte>。

java - 搜索文件以查找编码标签

2 回答 2

Related

Reference