-1

我已经安装了 Titan 和 Faunus,每个似乎都正常工作(titan-0.4.4 和 faunus-0.4.4)

但是,在 Titan 中摄取了一个相当大的图形并尝试通过以下方式将其导入 Faunus 中之后

FaunusFactory.open(    )

我遇到问题。更准确地说,我似乎确实从调用 FaunusFactory.open() 中得到了一个动物图,

faunusgraph[titanhbaseinputformat->titanhbaseoutputformat]

但是,即使问一个简单的

g.v(10)

我确实收到此错误:

Task Id : attempt_201407181049_0009_m_000000_0, Status : FAILED
com.thinkaurelius.titan.core.TitanException: Exception in Titan
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.getAdminInterface(HBaseStoreManager.java:380)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.ensureColumnFamilyExists(HBaseStoreManager.java:275)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.openDatabase(HBaseStoreManager.java:228)

我的属性文件直接从带有 Titan-HBase 输入的 Faunus 页面中取出,当然除了更改 hadoop 集群的 url:

faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
faunus.graph.input.titan.storage.backend=hbase
faunus.graph.input.titan.storage.hostname= my IP
faunus.graph.input.titan.storage.port=2181
faunus.graph.input.titan.storage.tablename=titan
faunus.graph.output.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseOutputFormat
faunus.graph.output.titan.storage.backend=hbase
faunus.graph.output.titan.storage.hostname= IP of my host
faunus.graph.output.titan.storage.port=2181
faunus.graph.output.titan.storage.tablename=titan
faunus.graph.output.titan.storage.batch-loading=true
faunus.output.location=output1
zookeeper.znode.parent=/hbase-unsecure
titan.graph.output.ids.block-size=100000

任何人都可以帮忙吗?

附录

为了解决下面的评论,这里有一些上下文:正如我所提到的,我在 Titan 中有一个图表,可以对其执行基本的 gremlin 查询。

但是,我确实需要运行一个 gremlin 全局查询,由于图形的大小,它需要 Faunus 及其底层 MR 功能。因此需要导入它。我得到的错误在我看来并不像它指向图表本身的一些不一致。

4

1 回答 1

1

I'm not sure that you have your "flow" of Faunus right. If your end result is to do a global query of the graph, then consider this approach:

  1. pull your graph to sequence file
  2. issue your global query over the sequence file

More specifically create hbase-seq.properties:

# input graph parameters
faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
faunus.graph.input.titan.storage.backend=hbase
faunus.graph.input.titan.storage.hostname=localhost
faunus.graph.input.titan.storage.port=2181
faunus.graph.input.titan.storage.tablename=titan
# hbase.mapreduce.scan.cachedrows=1000

# output data (graph or statistic) parameters
faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=snapshot
faunus.output.location.overwrite=true

In Faunus, copy do:

g = FaunusFactory.open('hbase-seq.properties')
g._()

That will read the graph from hbase and write it to sequence file in HDFS. Next, create: seq-noop.properties with these contents:

# input graph parameters
faunus.graph.input.format=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat
faunus.input.location=snapshot/job-0

# output data parameters
faunus.graph.output.format=com.thinkaurelius.faunus.formats.noop.NoOpOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=analysis
faunus.output.location.overwrite=true

The above configuration will read your sequence file from the previous step and without re-writing the graph (that's what NoOpOutputFormat is for). Now in Faunus do:

g = FaunusFactory.open('seq-noop.properties')
g.V.sideEffect('{it.degree=it.bothE.count()}').degree.groupCount()

This will execute a degree distribution, writing the results in HDFS to the 'analysis' directory. Obviously you can do whatever Faunus-flavored Gremlin you want here - I just wanted to provide an example. I think this is a pretty standard "flow" or pattern for using Faunus from a graph analysis perspective.

于 2014-07-23T11:53:04.087 回答