8

I'm playing with the idea of using SPARQL to identify conceptual overlap between things.

Take movies for example (LinkedMDB data), if I have a movie, "The Matrix" and my goal is to list movies that are similar to that movie, I would probably start by doing the following:

  • The Matrix
    • get genre
    • get actors
    • get director
    • get location
    • etc

And then using the things I identified in the matrix, I would query for things with those properties (pseudo-query)

SELECT movie, genre, director, location, actors
WHERE {
  genre is action or sci-fi .

  director are the Wachowski brothers .

  location is set in a big city .

  OPTIONAL( actors were in the matrix . )
}

Is there something in SPARQL that allows me to check for overlap of properties between different nodes? Or must this be done manually like I've proposed?

4

2 回答 2

13

匹配一些特定的属性

听起来你在要求类似的东西

select ?similarMovie ?genre ?director ?location ?actor where { 
  values ?movie { <http://.../TheMatrix> }
  ?genre   ^:hasGenre ?movie, ?similarMovie .
  ?director ^:hasDirectory ?movie, ?similarMovie .
  ?location ^:hasLocation ?movie, ?similarMovie .
  optional { ?actor ^:hasActor ?movie, ?similarMovie .
}

它使用反向路径表示法^和对象列表使其比以下内容短得多:

select ?similarMovie ?genre ?director ?location ?actor where { 
  values ?movie { <http://.../TheMatrix> }
  ?movie        :hasGenre    ?genre .
  ?movie        :hasDirector ?director .
  ?movie        :hasLocation ?location .
  ?similarMovie :hasGenre    ?genre .
  ?similarMovie :hasDirector ?director .
  ?similarMovie :hasLocation ?location .
  optional { 
    ?movie        :hasActor ?actor .
    ?similarMovie :hasActor ?actor .
  }
}

例如,使用 DBpedia,我们可以获得与《黑客帝国》具有相同发行商和摄影师的其他电影:

select ?similar ?cinematographer ?distributor where {
  values ?movie { dbpedia:The_Matrix }
  ?cinematographer ^dbpprop:cinematography ?movie, ?similar .
  ?distributor ^dbpprop:distributor ?movie, ?similar .
}
limit 10

SPARQL 结果

结果都在同一个特许经营范围内;你会得到:《黑客帝国》、《重新加载的黑客帝国》、《黑客帝国革命》、《黑客帝国》(特许经营权)和《终极黑客帝国》系列。

匹配至少一些属性

也可以要求至少有一些共同属性的东西。两件事需要有多少共同的属性才能被认为是相似的,这显然是主观的,将取决于特定的数据,并且需要一些实验。例如,我们可以在 DBpedia 上查询至少有 35 个与 Matrix 相同的属性的 Films,查询如下:

select ?similar where { 
  values ?movie { dbpedia:The_Matrix }
  ?similar ?p ?o ; a dbpedia-owl:Film .
  ?movie   ?p ?o .
}
group by ?similar ?movie
having count(?p) > 35

SPARQL 结果

这给出了 13 部电影(包括《黑客帝国》和该系列中的其他电影):

  • V字仇杀队(电影)
  • 矩阵
  • 邮差(电影)
  • 行政决定
  • 入侵(电影)
  • 拆迁人(电影)
  • 矩阵(特许经营)
  • 重新加载的矩阵
  • 免费杰克
  • 出口伤口
  • 矩阵革命
  • 爆发(电影)
  • 极速赛车手(电影)

使用这种方法,您甚至可以使用共同属性的数量来衡量相似性。例如:

select ?similar (count(?p) as ?similarity) where { 
  values ?movie { dbpedia:The_Matrix }
  ?similar ?p ?o ; a dbpedia-owl:Film .
  ?movie   ?p ?o .
}
group by ?similar ?movie
having count(?p) > 35
order by desc(?similarity)

SPARQL 结果

The Matrix             206
The Matrix Revolutions  63
The Matrix Reloaded     60
The Matrix (franchise)  55
Demolition Man (film)   41
Speed Racer (film)      40
V for Vendetta (film)   38
The Invasion (film)     38
The Postman (film)      36
Executive Decision      36
Freejack                36
Exit Wounds             36
Outbreak (film)         36
于 2014-01-22T17:58:57.890 回答
0

使用 DBpedia 中的新前缀,Joshua Taylor 的答案是:

select ?similar (count(?p) as ?similarity) where { 
  values ?movie { dbr:The_Matrix }
  ?similar ?p ?o ; a dbo:Film .
  ?movie   ?p ?o .
}
group by ?similar ?movie
having (count(?p) > 35)
order by desc(?similarity)

SPARQL 结果

于 2019-10-27T13:24:23.250 回答