algorithm - 如何比较（子）网络？

Question

为了这个问题，假设有几千个城市和一百万与他们访问过的城市相关的旅行者。平均而言，每位旅行者都去过 50 个城市，但极端情况下多达 1000 个。

1）第一个（简单）问题是如何根据访问城市之间的相似性为给定的旅行者找到最相似的旅行者列表？即我们有 3 位旅行者访问了这些城市：

A: 西雅图(WA), 巴尔的摩(MD), 达拉斯(TX) B: 波特兰(OR), DC, Austin(TX) C: 西雅图(WA), 波特兰(OR), DC, 巴尔的摩(MD)

如果我们现在比较访问过的城市，那么对于旅行者 A，最相似的将是旅行者 C。

由于对于每个旅行者来说，参与城市的子网本质上是一个直接连接节点的列表，因此即使没有 Cypher，比较也相对容易（不确定是否有通过 Cypher 进行比较的优雅方法）。

2) 更复杂的比较方案不是通过直接城市而是通过它们的特征（州、国家、气候、人口、景点类型等）。在我们的示例中，每个城市都与与领土相关的州相关联。如果我们根据地区寻找与 A 最相似的旅行者，那么旅行者 B 是赢家（尽管在城市级别上匹配为零）。

你对这两个问题有什么看法？

score 0 · Accepted Answer

If you really want a graph approach to this problem, I would recommend to have the properties of the cities as nodes, and link the cities to the nodes.

Therefore, you would have something along these lines :

(user)-[:visited]->(city)-[:has_property]->(property #i)

You could easily find similar cities by the number of links they have to the same properties.

You query would simply amount to a basic recommendation that can be implemented with cypher, with something along these lines (not tested, but you should get the idea) :

start A=node:users("")
match A-[:visited]->()-[:has_properties]->p
with distinct p as p
match p<-[r:has_properties]-cities
with distinct cities, count(r) as sim_score
order by sim_score desc limit 10
match cities<-[r:visited]-similar
return count(r) as score, similar
order by score desc limit 5

It works in 3 steps:

get the properties of the cities visited by user A (the "feature extraction" part)
then from the features, get the most similar cities
and finally retrieve the similar users who have the same "profile"

For performance issue, you can calculate the cities similarities offline, because it is something that shouldn't change very often, and focus on the "real-time" only for users similarity, which is less predictable.

Also don't forget to put enough RAM to your server ;)

score 0 · Accepted Answer

好吧，我认为您会用 Java 或任何其他 JVM 语言针对 hte neo4j Java API 编写自己的算法，并将其公开给世界，例如使用 neo4j 服务器插件，请参阅http://docs.neo4j.org/chunked/milestone /server-plugins.html

有很多资源可以帮助您开始编写这些算法，您可以查看 neo4j 图形算法的实现，请参阅https://github.com/neo4j/neo4j/tree/master/community/graph-algo

algorithm - 如何比较（子）网络？

2 回答 2

Related

Reference