6

我正在尝试为某些人从 DBpedia 中提取标签。我现在部分成功,但我陷入了以下问题。以下代码有效。

public class DbPediaQueryExtractor {
    public static void main(String [] args) {
        String entity = "Aharon_Barak";
        String queryString ="PREFIX dbres: <http://dbpedia.org/resource/> SELECT * WHERE {dbres:"+ entity+ "<http://www.w3.org/2000/01/rdf-schema#label> ?o FILTER (langMatches(lang(?o),\"en\"))}";
        //String queryString="select *     where { ?instance <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person>;  <http://www.w3.org/2000/01/rdf-schema#label>  ?o FILTER (langMatches(lang(?o),\"en\"))  } LIMIT 5000000";
        QueryExecution qexec = getResult(queryString);
        try {
            ResultSet results = qexec.execSelect();
            for ( ; results.hasNext(); )
            {
                QuerySolution soln = results.nextSolution();
                System.out.print(soln.get("?o") + "\n");
            }
        }
        finally {
            qexec.close();
        }
    }

    public static QueryExecution getResult(String queryString){
        Query query = QueryFactory.create(queryString);
        //VirtuosoQueryExecution vqe = VirtuosoQueryExecutionFactory.create (sparql, graph);
        QueryExecution qexec = QueryExecutionFactory.sparqlService("http://dbpedia.org/sparql", query);
        return qexec;
    }
}

但是,当实体包含括号时,它不起作用。例如,

String entity = "William_H._Miller_(writer)";

导致此异常:

线程“main”com.hp.hpl.jena.query.QueryParseException 中的异常:在第 1 行第 86 列遇到“”(““(“”)。

问题是什么?

4

1 回答 1

6

它需要一些复制和粘贴才能看到到底发生了什么。我建议您在查询中添加换行符以便于阅读。您正在使用的查询是:

PREFIX dbres: <http://dbpedia.org/resource/>
SELECT * WHERE
{
  dbres:??? <http://www.w3.org/2000/01/rdf-schema#label> ?o 
  FILTER (langMatches(lang(?o),"en"))
}

where???被字符串的内容替换entity。您在这里绝对没有进行输入验证,以确保entity粘贴的值是合法的。根据您的问题,它听起来像entitycontains William_H._Miller_(writer),所以您得到了查询:

PREFIX dbres: <http://dbpedia.org/resource/>
SELECT * WHERE
{
  dbres:William_H._Miller_(writer) <http://www.w3.org/2000/01/rdf-schema#label> ?o 
  FILTER (langMatches(lang(?o),"en"))
}

您可以将其粘贴到公共 DBpedia 端点中,您将收到类似的解析错误消息:

Virtuoso 37000 Error SP030: SPARQL compiler, line 6: syntax error at 'writer' before ')'

SPARQL query:
define sql:big-data-const 0 
#output-format:text/html
define sql:signal-void-variables 1 define input:default-graph-uri <http://dbpedia.org> PREFIX dbres: <http://dbpedia.org/resource/>
SELECT * WHERE
{
  dbres:William_H._Miller_(writer) <http://www.w3.org/2000/01/rdf-schema#label> ?o 
  FILTER (langMatches(lang(?o),"en"))
}

比使用错误查询访问 DBpedia 的端点更好,您还可以使用SPARQL 查询验证器,它会报告该查询:

语法错误:第 4 行第 34 列的词法错误。遇到:")" (41),之后:"writer"

在 Jena 中,您可以使用 ParameterizedSparqlString 来避免此类问题。这是您的示例,经过重新设计以使用参数化字符串:

import com.hp.hpl.jena.query.ParameterizedSparqlString;

public class PSSExample {
    public static void main( String[] args ) {
        // Create a parameterized SPARQL string for the particular query, and add the 
        // dbres prefix to it, for later use.
        final ParameterizedSparqlString queryString = new ParameterizedSparqlString(
                "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n" +
                "SELECT * WHERE\n" +
                "{\n" +
                "  ?entity rdfs:label ?o\n" +
                "  FILTER (langMatches(lang(?o),\"en\"))\n" +
                "}\n"
                ) {{
            setNsPrefix( "dbres", "http://dbpedia.org/resource/" );
        }};

        // Entity is the same. 
        final String entity = "William_H._Miller_(writer)";

        // Now retrieve the URI for dbres, concatentate it with entity, and use
        // it as the value of ?entity in the query.
        queryString.setIri( "?entity", queryString.getNsPrefixURI( "dbres" )+entity );

        // Show the query.
        System.out.println( queryString.toString() );
    }
}

输出是:

PREFIX dbres: <http://dbpedia.org/resource/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE
{
  <http://dbpedia.org/resource/William_H._Miller_(writer)> rdfs:label ?o
  FILTER (langMatches(lang(?o),"en"))
}

您可以在公共端点运行此查询并获得预期结果。请注意,如果您使用entity不需要特殊转义的 an ,例如,

final String entity = "George_Washington";

那么查询输出将使用前缀形式:

PREFIX dbres: <http://dbpedia.org/resource/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE
{
  dbres:George_Washington rdfs:label ?o
  FILTER (langMatches(lang(?o),"en"))
}

这非常方便,因为您不必检查您的后缀,即,是否有任何entity需要转义的字符;Jena 会为您解决这些问题。

于 2013-08-14T13:17:23.870 回答