java - 将 RAMDirectory 上传到 AzureCloud 会创建 EOF 异常

Question

我目前正在尝试使用 AzureBlobStorage 与 Lucene 一起工作。所以我创建了一个新目录，为了避免太多延迟，我使用 RAMDirectory 作为缓存（这可能不是最好的解决方案，但它似乎很容易做到，我愿意接受建议）。无论如何，一切似乎都很好，除了当我将.nrm文件写入云时，当我将它们上传到 blob 时总是会引发 EOFExceptions。

我将快速解释目录是如何工作的，因为它有助于理解：我创建了一个新的 IndexOutput BlobOutputStream，它几乎封装了一个，RAMOutputStream但是当它关闭时，它会将所有内容上传到 azureBlobStorage。这是如何完成的：

String fname = name;
output.flush();
long length = output.length();
output.close();
System.out.println("Size of the upload: " + length);
InputStream bStream = directory.openCachedInputAsStream(fname);
System.out.println("Uploading cache version of: " + fname);
blob.upload(bStream, length);
System.out.println("PUT finished for: " + fname);

blob是一个CloubBlockBlob并且output是一个RAMOutputStream。directory.openCacheInputAsStream打开一个新InputStream的IndexInput。

因此，大多数情况下，除了在上传.nrm文件时总是引发 an 的文件之外，一切都正常工作。EOFException尽管我检查了当索引中只有一个文档并且包含“NRM-1 和该文档的规范”时它们的长度为 5 个字节。

当我在上传调用中指定流的大小时，我真的不明白为什么 Azure 会尝试上传比文件中存在的更多的内容。

很抱歉，如果我不清楚，解释起来很有挑战性。如果您需要更多代码，请告诉我，我将在 github 或其他地方提供所有内容。

感谢您的回答

编辑

所以也许我的代码inputStream可能会显示问题：

public class StreamInput extends InputStream {
public IndexInput input;

public StreamInput(IndexInput openInput) {
    input = openInput;
}

@Override
public int read() throws IOException {
    System.out.println("Attempt to read byte: "+ input.getFilePointer());
    int b = input.readByte();
    System.out.println(b);
    return b;
}
}

这是我得到的痕迹：


Size of the upload: 5
Uploading cache version of: _0.nrm
Attempt to read byte: 0
78
Attempt to read byte: 1
82
Attempt to read byte: 2
77
Attempt to read byte: 3
-1
Attempt to read byte: 4
114
Attempt to read byte: 5
Attempt to read byte: 1029
java.io.EOFException: read past EOF: RAMInputStream(name=_0.nrm)
    at org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:100)
    at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:73)
    at org.lahab.clucene.core.StreamInput.read(StreamInput.java:18)
    at java.io.InputStream.read(InputStream.java:151)
    at com.microsoft.windowsazure.services.core.storage.utils.Utility.writeToOutputStream(Utility.java:1024)
    at com.microsoft.windowsazure.services.blob.client.BlobOutputStream.write(BlobOutputStream.java:560)
    at com.microsoft.windowsazure.services.blob.client.CloudBlockBlob.upload(CloudBlockBlob.java:455)
    at com.microsoft.windowsazure.services.blob.client.CloudBlockBlob.upload(CloudBlockBlob.java:374)
    at org.lahab.clucene.core.BlobOutputStream.close(BlobOutputStream.java:92)
    at org.apache.lucene.util.IOUtils.close(IOUtils.java:141)
    at org.apache.lucene.index.NormsWriter.flush(NormsWriter.java:172)
    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:71)
    at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
    at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:581)
    at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3587)
    at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:3376)
    at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3485)
    at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3467)
    at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3451)
    at org.lahab.clucene.server.IndexerNode.addDocuments(IndexerNode.java:139)

真的好像上传太远了……

score 0 · Accepted Answer

所以问题是我的 inputStream 以及我无法读取文档并转换字节的事实；）。我的读取功能应该是：

System.out.println("file:" + input.getFilePointer() + "/" + input.length());
if (input.getFilePointer() >= input.length()) {
    return -1;
}
System.out.println("Attempt to read byte: "+ input.getFilePointer());
int b = (int) input.readByte() & 0xff;
System.out.println(b);
return b;

javadoc 说关于 inputStream.read()：

从输入流中读取数据的下一个字节。值字节作为 int 返回，范围为 0 到 255。如果由于到达流的末尾而没有可用的字节，则返回值 -1。此方法会一直阻塞，直到输入数据可用、检测到流结束或引发异常。

然后& 0xff就是屏蔽符号位

java - 将 RAMDirectory 上传到 AzureCloud 会创建 EOF 异常

编辑

1 回答 1

Related

Reference