4

我正在尝试使用 Blazegraph 在 ConceptNet 上运行图形算法,但首先我必须导入数据。数据将是Write Once, Read Many,所以我不需要任何增量写入。

我从它的 .deb 文件安装了 Blazegraph 2.1.1。我还下载blazegraph.jar了,以便我可以按照涉及在 blazegraph.jar 上运行命令的说明进行操作。

该文件assoc.nt采用 N-Triples 格式,包含大约 2500 万条边。以下是一些从一开始的:

</c/af/a_foei_tog/r> </r/SenseOf> </c/af/a_foei_tog> .
</c/af/a_foei_tog/r> </r/Synonym> </c/af/jammer> .
</c/af/a_foei_tog/r> </r/Synonym> </c/af/ongelukkig> .
</c/af/a_foei_tog/r> </r/RelatedTo> </c/fr/malheureusement> .
</c/af/a_foe%C4%B1_tog/r> </r/SenseOf> </c/af/a_foe%C4%B1_tog> .
</c/af/a_foe%C4%B1_tog/r> </r/Synonym> </c/af/jammer> .
</c/af/a_foe%C4%B1_tog/r> </r/Synonym> </c/af/ongelukk%C4%B1g> .
</c/af/a_foe%C4%B1_tog/r> </r/RelatedTo> </c/fr/malheureusement> .
</c/af/a_ja_a/r> </r/SenseOf> </c/af/a_ja_a> .
</c/af/a_ja_a/r> </r/Synonym> </c/af/seker> .
</c/af/a_ja_a/r> </r/Synonym> </c/af/sekerlik> .

fastload.propertiesGitHub 上的 Blazegraph 示例中获得,但后来改变了结尾:

  • 我补充说com.bigdata.journal.AbstractJournal.file=blazegraph.jnl,否则它会告诉我财产丢失了。

  • 我将bufferModefrom更改DiskRWDisk,因为某人的属性文件表明这会给我 Write-Once-Read-Many 语义,这正是我想要的。

这是我的决赛fastload.properties

# This configuration turns off incremental inference for load and retract, so
# you must explicitly force these operations if you want to compute the closure
# of the knowledge base.  Forcing the closure requires punching through the SAIL
# layer.  Of course, if you are not using inference then this configuration is
# just the ticket and is quite fast.

# set the initial and maximum extent of the journal
com.bigdata.journal.AbstractJournal.initialExtent=209715200
com.bigdata.journal.AbstractJournal.maximumExtent=209715200

# turn off automatic inference in the SAIL
com.bigdata.rdf.sail.truthMaintenance=false

# don't store justification chains, meaning retraction requires full manual
# re-closure of the database
com.bigdata.rdf.store.AbstractTripleStore.justify=false

# turn off the statement identifiers feature for provenance
com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false

# turn off the free text index
com.bigdata.rdf.store.AbstractTripleStore.textIndex=false

com.bigdata.journal.AbstractJournal.bufferMode=Disk
com.bigdata.journal.AbstractJournal.file=blazegraph.jnl

我运行了命令:

java -cp blazegraph.jar com.bigdata.rdf.store.DataLoader -namespace conceptnet fastload.properties ~/conceptnet5/data/assoc/assoc.nt

它使 CPU 旋转了几分钟,但最终似乎什么也没增加。这是我得到的输出:

WARN : ServiceProviderHook.java:171: Running.
INFO: com.bigdata.util.config.LogUtil: Configure: jar:file:/home/rspeer/src/blazegraph/blazegraph.jar!/log4j.properties

BlazeGraph(TM) Graph Engine

                   Flexible
                   Reliable
                  Affordable
      Web-Scale Computing for the Enterprise

Copyright SYSTAP, LLC DBA Blazegraph 2006-2016.  All rights reserved.

[my hostname appeared here]
Mon Jun 13 13:36:05 EDT 2016
Linux/3.13.0-83-generic amd64
Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz Family 6 Model 62 Stepping 4, GenuineIntel #CPU=4
Oracle Corporation 1.8.0_74
freeMemory=1002354744
buildVersion=2.1.1
gitCommit=90d9e8232969a8afdc830e856643e5416bb50d0a

Dependency         License                                                                 
ICU                http://source.icu-project.org/repos/icu/icu/trunk/license.html          
bigdata-ganglia    http://www.apache.org/licenses/LICENSE-2.0.html                         
blueprints-core    https://github.com/tinkerpop/blueprints/blob/master/LICENSE.txt         
colt               http://acs.lbl.gov/software/colt/license.html                           
commons-codec      http://www.apache.org/licenses/LICENSE-2.0.html                         
commons-fileupload http://www.apache.org/licenses/LICENSE-2.0.html                         
commons-io         http://www.apache.org/licenses/LICENSE-2.0.html                         
commons-logging    http://www.apache.org/licenses/LICENSE-2.0.html                         
dsiutils           http://www.gnu.org/licenses/lgpl-2.1.html                               
fastutil           http://www.apache.org/licenses/LICENSE-2.0.html                         
flot               http://www.opensource.org/licenses/mit-license.php                      
high-scale-lib     http://creativecommons.org/licenses/publicdomain                        
httpclient         http://www.apache.org/licenses/LICENSE-2.0.html                         
httpclient-cache   http://www.apache.org/licenses/LICENSE-2.0.html                         
httpcore           http://www.apache.org/licenses/LICENSE-2.0.html                         
httpmime           http://www.apache.org/licenses/LICENSE-2.0.html                         
jackson-core       http://www.apache.org/licenses/LICENSE-2.0.html                         
jetty              http://www.apache.org/licenses/LICENSE-2.0.html                         
jquery             https://github.com/jquery/jquery/blob/master/MIT-LICENSE.txt            
jsonld             https://raw.githubusercontent.com/jsonld-java/jsonld-java/master/LICENCE
log4j              http://www.apache.org/licenses/LICENSE-2.0.html                         
lucene             http://www.apache.org/licenses/LICENSE-2.0.html                         
nanohttp           http://elonen.iki.fi/code/nanohttpd/#license                            
rexster-core       https://github.com/tinkerpop/rexster/blob/master/LICENSE.txt            
river              http://www.apache.org/licenses/LICENSE-2.0.html                         
semargl            https://github.com/levkhomich/semargl/blob/master/LICENSE               
servlet-api        http://www.apache.org/licenses/LICENSE-2.0.html                         
sesame             http://www.openrdf.org/download.jsp                                     
slf4j              http://www.slf4j.org/license.html                                       
zookeeper          http://www.apache.org/licenses/LICENSE-2.0.html                         

Reading properties: fastload.properties
Will load from: /home/rspeer/conceptnet5/data/assoc/assoc.nt
Journal file: blazegraph.jnl
Load: 0 stmts added in 171.173 secs, rate= 0, commitLatency=0ms, {failSet=0,goodSet=1}
Total elapsed=172015ms
4

1 回答 1

3

我相信我已经找到了我遇到的问题的答案。

当 Blazegraph 导入 N-Triples 数据时,它会跳过相对 URI。我的 URI 是相对的这一事实是我的错误;似乎在 N-Triples 中只允许使用绝对 URI,但 Blazegraph 最好让我知道这一点,而不是默默地失败。

我在我的所有 URI 前面加上http://一个域名,现在它正在加载数据。这是我的数据现在的样子:

<http://api.conceptnet.io/c/af/a_foei_tog/r> <http://api.conceptnet.io/r/SenseOf> <http://api.conceptnet.io/c/af/a_foei_tog> .
<http://api.conceptnet.io/c/af/a_foei_tog/r> <http://api.conceptnet.io/r/Synonym> <http://api.conceptnet.io/c/af/jammer> .
<http://api.conceptnet.io/c/af/a_foei_tog/r> <http://api.conceptnet.io/r/Synonym> <http://api.conceptnet.io/c/af/ongelukkig> .
<http://api.conceptnet.io/c/af/a_foei_tog/r> <http://api.conceptnet.io/r/RelatedTo> <http://api.conceptnet.io/c/fr/malheureusement> .
<http://api.conceptnet.io/c/af/a_foe%C4%B1_tog/r> <http://api.conceptnet.io/r/SenseOf> <http://api.conceptnet.io/c/af/a_foe%C4%B1_tog> .
<http://api.conceptnet.io/c/af/a_foe%C4%B1_tog/r> <http://api.conceptnet.io/r/Synonym> <http://api.conceptnet.io/c/af/jammer> .
<http://api.conceptnet.io/c/af/a_foe%C4%B1_tog/r> <http://api.conceptnet.io/r/Synonym> <http://api.conceptnet.io/c/af/ongelukk%C4%B1g> .
<http://api.conceptnet.io/c/af/a_foe%C4%B1_tog/r> <http://api.conceptnet.io/r/RelatedTo> <http://api.conceptnet.io/c/fr/malheureusement> .
<http://api.conceptnet.io/c/af/a_ja_a/r> <http://api.conceptnet.io/r/SenseOf> <http://api.conceptnet.io/c/af/a_ja_a> .
<http://api.conceptnet.io/c/af/a_ja_a/r> <http://api.conceptnet.io/r/Synonym> <http://api.conceptnet.io/c/af/seker> .

我得到了一些令人震惊的输出,似乎表明加载每个“记录”需要 1 到 10 秒,但我认为这些警告具有误导性,因为它们仅在加载显着减慢时出现:

WARN : AbstractBTree.java:3758: wrote: name=kb.spo.OSP, 1 records (#nodes=1, #leaves=0) in 14582ms : addrRoot=22869767568228938
WARN : AbstractBTree.java:3758: wrote: name=kb.spo.POS, 1 records (#nodes=1, #leaves=0) in 14582ms : addrRoot=22869765391385095
WARN : AbstractBTree.java:3758: wrote: name=kb.spo.OSP, 9 records (#nodes=5, #leaves=4) in 10690ms : addrRoot=25508598331212042
WARN : AbstractBTree.java:3758: wrote: name=kb.spo.POS, 1 records (#nodes=1, #leaves=0) in 9335ms : addrRoot=38702680415142364
WARN : AbstractBTree.java:3758: wrote: name=kb.spo.POS, 9 records (#nodes=6, #leaves=3) in 6932ms : addrRoot=63331668311671368
WARN : AbstractBTree.java:3758: wrote: name=kb.spo.POS, 1 records (#nodes=1, #leaves=0) in 11326ms : addrRoot=80044185196954272

尽管有警告,但它在大约 8 分钟内加载了 2500 万条边,这还不错。

于 2016-06-13T19:11:20.807 回答