0

嗨,可以在使用 java api 创建 tdb 期间查看存储的三倍数吗?我在 turtle 中使用 rar 文件运行 TDB 工厂,但是在我的目录中创建文件期间,我看不到它存储了多少个三元组。我怎么解决这个问题?

4

1 回答 1

0

您可以通过 java 代码访问 bulk-loader(查看引入的三元组),如下所示:

final Dataset tdbDataset = TDBFactory.createDataset( /*location*/ );
try( final InputStream in = /*get input stream for your large file*/) {
    TDBLoader.load( ((DatasetGraphTransaction)tdbDataset.asDatasetGraph()).getBaseDatasetGraph() , in, true);
}

如果您的存档中有多个文件(为简单起见,我不会做 rar,而是 zip),那么根据对这个问题的回答,您可以通过在传递之前将文件连接成单个文件来获得优化的性能他们到散装装载机。改进的性能源于延迟创建索引,直到引入所有三元组。我确定还有其他格式受支持,但我只测试了N-TRIPLES.

以下示例使用IOUtilsfromcommons-io复制流:

final Dataset tdbDataset = TDBFactory.createDataset( /*location*/ );
final PipedOutputStream concatOut = new PipedOutputStream();
final PipedInputStream concatIn = new PipedInputStream(concatOut);

final ExecutorService workers = Executors.newFixedThreadPool(2);
final Future<Long> submitter = workers.submit(new Callable<Long>(){
    @Override
    public Long call() throws Exception {
        long filesLoaded = 0;
        try( final ZipFile zipFile = new ZipFile( /* Archive Location */ ) {
            final Enumeration< ? extends ZipEntry> zipEntries = zipFile.entries();
            while( zipEntries.hasMoreElements() ) {
                final ZipEntry entry = zipEntries.nextElement();
                try( final InputStream singleIn = zipFile.getInputStream(entry) ) {
                    // If your file is in a supported format already
                    IOUtils.copy(singleIn, concatOut); 
                    /*(final Model m = ModelFactory.createDefaultModel();
                    m.read(singleIn, null, "lang");
                    m.write(concatOut, "N-TRIPLES");*/
                }
                filesLoaded++;
            }
        }
        concatOut.close();
        return filesLoaded;
    }});

final Future<Void> comitter = workers.submit(new Callable<Void>(){
    @Override
    public Void call() throws Exception {
        TDBLoader.load( ((DatasetGraphTransaction)tdbDataset.asDatasetGraph()).getBaseDatasetGraph() , concatIn, true);
        return null;
    }});

workers.shutdown();
System.out.println("submitted "+submitter.get()+" input files for processing");
comitter.get();
System.out.println("completed processing");
workers.awaitTermination(1, TimeUnit.SECONDS); // NOTE this wait is redundant
于 2014-09-18T14:11:37.773 回答