apache-spark - GraphFrames 主题搜索上的边缘属性过滤器不起作用

Question

我有一些关于我想查询的家庭图的示例数据。

我想在 GraphFrames 对象上使用 find 方法来查询边缘类型为“Mother”的主题 A->B。

由于 GraphFrames 使用 Neo4J 的密码语言的子集，我想知道以下是否是正确的查询？

graph.find("(A)-[edge:Mother]->(B)").show

或者在 GraphFrames 中实现这一点的最佳方法是什么？

GraphFrame(vertex, graph.edges.filter("attr=='Mother'")).vertices.show

这不起作用，因为我无法过滤方向，所以我只想得到母亲:)

任何的想法？

score 2 · Accepted Answer

假设这是您的测试数据：

import org.graphframes.GraphFrame

val edgesDf = spark.sqlContext.createDataFrame(Seq(
  ("a", "b", "Mother"),
  ("b", "c", "Father"),  
  ("d", "c", "Father"),
  ("e", "b", "Mother")    
)).toDF("src", "dst", "relationship")

val graph = GraphFrame.fromEdges(edgesDf)
graph.edges.show()

+---+---+------------+
|src|dst|relationship|
+---+---+------------+
|  a|  b|      Mother|
|  b|  c|      Father|
|  d|  c|      Father|
|  e|  b|      Mother|
+---+---+------------+

您可以使用主题查询并对其应用过滤器：

graph.find("()-[e]->()").filter("e.relationship = 'Mother'").show()

+------------+
|           e|
+------------+
|[a,b,Mother]|
|[e,b,Mother]|
+------------+

或者，由于您的情况相对简单，您可以将过滤器应用于图形的边缘：

graph.edges.filter("relationship = 'Mother'").show()

+---+---+------------+
|src|dst|relationship|
+---+---+------------+
|  a|  b|      Mother|
|  e|  b|      Mother|
+---+---+------------+

这是一些替代语法（每个都得到与上面相同的结果）：

graph.edges.filter($"relationship" === "Mother").show()
graph.edges.filter('relationship === "Mother").show()

您提到了方向过滤，但是每个关系的方向都在图形本身中编码（即从源到目的地）。

apache-spark - GraphFrames 主题搜索上的边缘属性过滤器不起作用

1 回答 1

Related

Reference