graph - 将 Titan 图摄取到 Faunus 时出现问题

Question

我已经安装了 Titan 和 Faunus，每个似乎都正常工作（titan-0.4.4 和 faunus-0.4.4）

但是，在 Titan 中摄取了一个相当大的图形并尝试通过以下方式将其导入 Faunus 中之后

FaunusFactory.open(    )

我遇到问题。更准确地说，我似乎确实从调用 FaunusFactory.open() 中得到了一个动物图，

faunusgraph[titanhbaseinputformat->titanhbaseoutputformat]

但是，即使问一个简单的

g.v(10)

我确实收到此错误：

Task Id : attempt_201407181049_0009_m_000000_0, Status : FAILED
com.thinkaurelius.titan.core.TitanException: Exception in Titan
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.getAdminInterface(HBaseStoreManager.java:380)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.ensureColumnFamilyExists(HBaseStoreManager.java:275)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.openDatabase(HBaseStoreManager.java:228)

我的属性文件直接从带有 Titan-HBase 输入的 Faunus 页面中取出，当然除了更改 hadoop 集群的 url：

faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
faunus.graph.input.titan.storage.backend=hbase
faunus.graph.input.titan.storage.hostname= my IP
faunus.graph.input.titan.storage.port=2181
faunus.graph.input.titan.storage.tablename=titan
faunus.graph.output.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseOutputFormat
faunus.graph.output.titan.storage.backend=hbase
faunus.graph.output.titan.storage.hostname= IP of my host
faunus.graph.output.titan.storage.port=2181
faunus.graph.output.titan.storage.tablename=titan
faunus.graph.output.titan.storage.batch-loading=true
faunus.output.location=output1
zookeeper.znode.parent=/hbase-unsecure
titan.graph.output.ids.block-size=100000

任何人都可以帮忙吗？

附录：

为了解决下面的评论，这里有一些上下文：正如我所提到的，我在 Titan 中有一个图表，可以对其执行基本的 gremlin 查询。

但是，我确实需要运行一个 gremlin 全局查询，由于图形的大小，它需要 Faunus 及其底层 MR 功能。因此需要导入它。我得到的错误在我看来并不像它指向图表本身的一些不一致。

score 1 · Accepted Answer

I'm not sure that you have your "flow" of Faunus right. If your end result is to do a global query of the graph, then consider this approach:

pull your graph to sequence file
issue your global query over the sequence file

More specifically create hbase-seq.properties:

# input graph parameters
faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
faunus.graph.input.titan.storage.backend=hbase
faunus.graph.input.titan.storage.hostname=localhost
faunus.graph.input.titan.storage.port=2181
faunus.graph.input.titan.storage.tablename=titan
# hbase.mapreduce.scan.cachedrows=1000

# output data (graph or statistic) parameters
faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=snapshot
faunus.output.location.overwrite=true

In Faunus, copy do:

g = FaunusFactory.open('hbase-seq.properties')
g._()

That will read the graph from hbase and write it to sequence file in HDFS. Next, create: seq-noop.properties with these contents:

# input graph parameters
faunus.graph.input.format=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat
faunus.input.location=snapshot/job-0

# output data parameters
faunus.graph.output.format=com.thinkaurelius.faunus.formats.noop.NoOpOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=analysis
faunus.output.location.overwrite=true

The above configuration will read your sequence file from the previous step and without re-writing the graph (that's what NoOpOutputFormat is for). Now in Faunus do:

g = FaunusFactory.open('seq-noop.properties')
g.V.sideEffect('{it.degree=it.bothE.count()}').degree.groupCount()

This will execute a degree distribution, writing the results in HDFS to the 'analysis' directory. Obviously you can do whatever Faunus-flavored Gremlin you want here - I just wanted to provide an example. I think this is a pretty standard "flow" or pattern for using Faunus from a graph analysis perspective.

graph - 将 Titan 图摄取到 Faunus 时出现问题

1 回答 1

Related

Reference