hadoop - 打开 IgnitePath 时如何获取 InputStream（返回 HadoopIgfsSecondaryFileSystemPositionedReadable）？

Question

通常，在使用 Hadoop 和 Flink 时，从分布式文件系统打开/读取文件将返回扩展 java.io.InputStream 的 Source（Sink 的对应物）对象。

但是，在 Apache Ignite 中，IgfsSecondaryFileSystem，更具体地说是 IgniteHadoopIgfsSecondaryFileSystem，在调用其“ open ”方法（通过传递 IgfsPath）时返回 HadoopIgfsSecondaryFileSystemPositionedReadable 类型的对象。

HadoopIgfsSecondaryFileSystemPositionedReadable提供了一种“读取”方法，但需要了解有关要读取的数据所在位置的详细信息，例如输入流位置。

/**
 * Read up to the specified number of bytes, from a given position within a file, and return the number of bytes
 * read.
 *
 * @param pos Position in the input stream to seek.
 * @param buf Buffer into which data is read.
 * @param off Offset in the buffer from which stream data should be written.
 * @param len The number of bytes to read.
 * @return Total number of bytes read into the buffer, or -1 if there is no more data (EOF).
 * @throws IOException In case of any exception.
 */
public int read(long pos, byte[] buf, int off, int len) throws IOException;

如何在调用read方法之前确定这些细节？

我对这些框架很陌生，也许存在一种不同的方法来获取基于指向存储在 Hadoop 文件系统中的文件的 IgfsPath 的 InputStream？

我正在尝试实现此处描述的内容：https ://apacheignite-fs.readme.io/docs/secondary-file-system

提前感谢您的任何提示！

score 0 · Accepted Answer

不应直接使用IgfsSecondaryFileSystem接口。您可以将 Hadoop 集群配置为用作读取和写入操作的辅助 FS。

IgfsSecondaryFileSystem只能在配置中指定为FileSystemConfiguration#secondaryFileSystem属性。

您应该改用IgniteFileSystem接口。您可以通过调用Ignite#fileSystem(...)方法获取它的实例。要通过 IGFS 获取InputStream路径，可以使用IgniteFileSystem#open(...)方法。

hadoop - 打开 IgnitePath 时如何获取 InputStream（返回 HadoopIgfsSecondaryFileSystemPositionedReadable）？

1 回答 1

Related

Reference