1

我正在尝试学习使用 RDF,并尝试从 dbpedia 中提取一组事实作为我的学习练习。下面的代码示例有点工作,但对于配偶等主题,它总是会拉出他们自己的人。

问题:

  1. get_name_from_uri() 提取 URI 的最后一部分并删除下划线 - 必须有更好的方法来获取人名
  2. 配偶的结果拉回配偶但也拉回数据主体 - 不确定那里发生了什么
  3. 一些结果以 URI 格式和文本项的形式提取数据 -

这是代码块的输出,显示了我得到的一些奇怪的结果(请参阅属性中的混合输出,他与自己结婚的事实以及 Josephine 的错误名字?

Accessing facts for Napoleon  held at  http://dbpedia.org/resource/Napoleon

There are  800  facts about Napoleon stored at the URI
http://dbpedia.org/resource/Napoleon

Here are a few:-
Ontology:deathdate

Napoleon died on 1821-05-05

Ontology:birthdate
Napoleon was born on 1769-08-15

Property:spouse retruns the person themslves twice !
Napoleon was married to  Marie Louise, Duchess of Parma
Napoleon was married to  Napoleon
Napoleon was married to  Jos%C3%A9phine de Beauharnais
Napoleon was married to  Napoleon

Property:title retruns text and uri's
Napoleon  Held the title:  "The Death of Napoleon"
Napoleon  Held the title: http://dbpedia.org/resource/Emperor_of_the_French
Napoleon  Held the title: http://dbpedia.org/resource/King_of_Italy
Napoleon  Held the title:  First Consul of France
Napoleon  Held the title:  Provisional Consul of France
Napoleon  Held the title:  http://dbpedia.org/resource/Napoleon
Napoleon  Held the title:  Emperor of the French
Napoleon  Held the title: http://dbpedia.org/resource/Co-Princes_of_Andorra
Napoleon  Held the title:  from the Memoirs of Bourrienne, 1831
Napoleon  Held the title:  Protector of the Confederation of the Rhine

Ontology birth place returns three records
Napoleon was born in  Ajaccio
Napoleon was born in  Corsica
Napoleon was born in  Early modern France

这是产生上述输出的 python,它需要 rdflib,并且正在进行中。

import rdflib
from rdflib import Graph, URIRef, RDF

######################################
#  A quick test of a python library reflib to get data from an rdf graph
# D Moore 15/3/2014
# needs rdflib > version 3.0

# CHANGE THE URI BELOW TO A DIFFERENT PERSON AND SEE WHAT HAPPENS
# COULD DO WITH A WEB FORM 
# NOTES:
#
#URI_ref = 'http://dbpedia.org/resource/Richard_Nixon'
#URI_ref = 'http://dbpedia.org/resource/Margaret_Thatcher'
#URI_ref = 'http://dbpedia.org/resource/Isaac_Newton'
#URI_ref = 'http://dbpedia.org/resource/Richard_Nixon'
URI_ref = 'http://dbpedia.org/resource/Napoleon'
#URI_ref = 'http://dbpedia.org/resource/apple'
##########################################################


def get_name_from_uri(dbpedia_uri):  
    # pulls the last part of a uri out and removes underscores
    # got to be an easier way but it works
    output_string = ""
    s = dbpedia_uri
    # chop the url into bits devided by the /
    tokens = s.split("/")
    # because the name of our person is in the last section itterate through each token 
    # and replace the underscore with a space
    for i in tokens :
        str = ''.join([i])
        output_string = str.replace('_',' ')
    # returns the name of the person without underscores 
    return(output_string)

def is_person(uri):
#####  SPARQL way to do this
    uri = URIRef(uri)
    person = URIRef('http://dbpedia.org/ontology/Person')
    g= Graph()
    g.parse(uri)
    resp = g.query(
        "ASK {?uri a ?person}",
        initBindings={'uri': uri, 'person': person}
    )
    print uri, "is a person?", resp.askAnswer
    return resp.askAnswer

URI_NAME = get_name_from_uri(URI_ref)
NAME_LABEL = ''

if is_person(URI_ref):
    print "Accessing facts for", URI_NAME, " held at ", URI_ref

    g = Graph()
    g.parse(URI_ref)
    print "Person Extract for", URI_NAME
    print "There are ",len(g)," facts about", URI_NAME, "stored at the URI ",URI_ref
    print "Here are a few:-"


    # Ok so lets get some facts for our person
    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/birthName")):
        print URI_NAME, "was born " + str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/deathDate")):
        print URI_NAME, "died on", str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/birthDate")):
        print URI_NAME, "was born on", str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/eyeColor")):
        print URI_NAME, "had eyes coloured", str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/spouse")):
        print URI_NAME, "was married to ", get_name_from_uri(str(stmt[1]))

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/reigned")):
        print URI_NAME, "reigned ", get_name_from_uri(str(stmt[1]))

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/children")):
        print URI_NAME, "had a child called ", get_name_from_uri(str(stmt[1]))

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/profession")):
        print URI_NAME, "(PROPERTY profession) was trained as a  ", get_name_fro    m_uri(str(stmt[1]))

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/child")):
        print URI_NAME, "PROPERTY child ", get_name_from_uri(str(stmt[1]))

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/deathplace")):
        print URI_NAME, "(PROPERTY death place) died at: ", str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/title")):
        print URI_NAME, "(PROPERTY title) Held the title: ", str(stmt[1])


    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/sex")):
        print URI_NAME, "was a ", str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/knownfor")):
        print URI_NAME, "was known for ", str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/birthPlace")):
        print URI_NAME, "was born in ", get_name_from_uri(str(stmt[1]))

else:
    print "ERROR - "
    print "Resource", URI_ref, 'does not look to be a person or there is no record in dbpedia'
4

1 回答 1

2

获取名称

*get_name_from_uri* 正在处理 URI。由于 DBpedia 数据rdfs:labels几乎涵盖了所有内容,因此请求rdfs:label并将其用作值可能是一个更好的主意。例如,查看运行DBpedia SPARQL 端点的 SPARQL 查询的结果:

select ?spouse ?spouseName where {
  dbpedia:Napoleon dbpedia-owl:spouse ?spouse .
  ?spouse rdfs:label ?spouseName .
  filter( langMatches(lang(?spouseName),"en") )
}
spouse                                                      spouseName
http://dbpedia.org/resource/Jos%C3%A9phine_de_Beauharnais   "Joséphine de Beauharnais"@en
http://dbpedia.org/resource/Marie_Louise,_Duchess_of_Parma  "Marie Louise, Duchess of Parma"@en

意外的配偶

subject_objects的文档说

subject_objects(自我,谓词=无)

给定谓词的(主语,宾语)元组生成器

您正确地看到,DBpedia 中有四个三元组具有谓词dbpprop:spouse(顺便说一句,您是否有理由不使用dbpedia-owl:spouse?)并且具有Napoleon主语或宾语:

Napoleon                       spouse Marie Louise, Duchess of Parma
Marie Louise, Duchess of Parma spouse Napoleon 
Napoleon                       spouse Jos%C3%A9phine de Beauharnais
Jos%C3%A9phine de Beauharnais  spouse Napoleon

对于其中的每一个,您都在打印

"Napoleon was married to X"

其中 X 是三元组的对象。也许您应该objects改用:

对象(自我,主体=无,谓词=无)

具有给定主语和谓语的对象生成器

URI 与文本(文字)结果

DBpedia 本体属性描述的数据(URI 以 开头http://dbpedia.org/ontology/,通常缩写为)比 DBpedia 原始数据属性(URI 以 开头,通常缩写为)dbpedia-owl:描述的数据“干净”得多。例如,当您查看标题时,您使用的是 property ,并且有 URI 和文字作为值。不过,它看起来不像有 a ,所以在这种情况下,你只需要处理它。不过,过滤掉其中一个很容易:http://dbpedia.org/property/dbpprop:dbpprop:titledbpedia-owl:title

select ?title where {
  dbpedia:Napoleon dbpprop:title ?title
  filter isLiteral(?title)
}
title
================================================
"Emperor of the French"@en
"Protector of the Confederation of the Rhine"@en
"First Consul of France"@en
"Provisional Consul of France"@en
""The Death of Napoleon""@en
"from the Memoirs of Bourrienne, 1831"@en
select ?title where {
  dbpedia:Napoleon dbpprop:title ?title
  filter isURI(?title)
}
title
=================================================
http://dbpedia.org/resource/Co-Princes_of_Andorra
http://dbpedia.org/resource/Emperor_of_the_French
http://dbpedia.org/resource/King_of_Italy
于 2014-03-20T14:40:08.780 回答