3

我的工作是关于推荐系统的图书馆书籍。作为输入,我需要书籍分类本体。在我的本体中对图书馆书籍进行分类。此分类有 14 个类别,除了兄弟类 Author、book、Isbn。book 类的个人是 book 的主题(约 600 个主题),author 类的个人是 name 的作者,也是 isbn 类。我用 protege 4.1 设计了这个本体。

我也手动收集并获得了部分属于书的类别。一个对象属性是名称“hasSubject”与类别相关的单个书籍类。示例书“A”具有主题类别“S”和“F”,并且...结果我想获得属于图书类别的矩阵。这是如果图书属于某个类别则取 1 否则取值为 0 的方式。像这样:

     cat1   cat2   cat3   
book1   1      0      0   
book2   1      0      1   
book3   1      1      0  

在此示例中,表示 book1 属于类别 1 并且不属于类别 2 和 3。如何在 protege 中使用 sparql 完成这项工作?

4

1 回答 1

6

处理固定数量的类别

给定数据,例如

@prefix : <http://example.org/books/> .

:book1 a :Book, :Cat1 .
:book2 a :Book, :Cat1, :Cat3 .
:book3 a :Book, :Cat1, :Cat2 .

您可以使用类似的查询

prefix : <http://example.org/books/>

select ?individual
       (if(bound(?cat1),1,0) as ?Cat1)
       (if(bound(?cat2),1,0) as ?Cat2)
       (if(bound(?cat3),1,0) as ?Cat3)
where {
  ?individual a :Book .
  OPTIONAL { ?individual a :Cat1 . bind( ?individual as ?cat1 ) } 
  OPTIONAL { ?individual a :Cat2 . bind( ?individual as ?cat2 ) }
  OPTIONAL { ?individual a :Cat3 . bind( ?individual as ?cat3 ) }
}
order by ?book

根据是否存在某些三元组以获得如下结果,其中绑定了某些变量(尽管它们绑定的特定值并不重要):

$ arq --data data.n3 --query matrix.sparql
-----------------------------------
| individual | Cat1 | Cat2 | Cat3 |
===================================
| :book1     | 1    | 0    | 0    |
| :book2     | 1    | 0    | 1    |
| :book3     | 1    | 1    | 0    |
-----------------------------------

处理任意数量的类别

这是一个似乎适用于耶拿的解决方案,但我不确定具体结果是否得到保证。(更新:基于这个answers.semanticweb.com 问题和答案,似乎SPARQL 规范不能保证这种行为。)如果我们有更多的数据,例如,关于哪些东西是类别,哪些是书籍,例如,

@prefix : <http://example.org/books/> .

:book1 a :Book, :Cat1 .
:book2 a :Book, :Cat1, :Cat3 .
:book3 a :Book, :Cat1, :Cat2 .

:Cat1 a :Category .
:Cat2 a :Category .
:Cat3 a :Category .

然后我们可以运行一个子查询,按顺序选择所有类别,然后为每本书计算一个字符串,指示该书是否属于每个类别。

prefix : <http://example.org/books/>

select ?book (group_concat(?isCat) as ?matrix) where { 
  { 
    select ?category where { 
      ?category a :Category 
    }
    order by ?category 
  }
  ?book a :Book .
  OPTIONAL { bind( 1 as ?isCat )              ?book a ?category . }
  OPTIONAL { bind( 0 as ?isCat ) NOT EXISTS { ?book a ?category } }
}
group by ?book
order by ?book

这有输出:

$ arq --data data.n3 --query matrix2.query
--------------------
| book   | matrix  |
====================
| :book1 | "1 0 0" |
| :book2 | "1 0 1" |
| :book3 | "1 1 0" |
--------------------

这更接近问题中的输出,并处理任意数字类别。但是,这取决于?category每个 以相同顺序处理的值?book,我不确定这是否得到保证。

我们甚至可以使用这种方法为表格生成标题行。同样,这取决于?category每个 以相同顺序处理的值?book,这可能无法保证,但似乎在 Jena 中有效。要获取类别标题,我们需要做的就是创建一个?book未绑定的行,其值?isCat指示特定类别:

prefix : <http://example.org/books/>

select ?book (group_concat(?isCat) as ?matrix) where { 
  { 
    select ?category where { 
      ?category a :Category 
    }
    order by ?category 
  }

  # This generates the header row where ?isCat is just
  # the category, so the group_concat gives headers.
  { 
    bind(?category as ?isCat) 
  }
  UNION 
  # This is the table as before
  {
    ?book a :Book .
    OPTIONAL { bind( 1 as ?isCat )              ?book a ?category . }
    OPTIONAL { bind( 0 as ?isCat ) NOT EXISTS { ?book a ?category } }
  }
}
group by ?book
order by ?book

我们得到这个输出:

--------------------------------------------------------------------------------------------------------
| book   | matrix                                                                                      |
========================================================================================================
|        | "http://example.org/books/Cat1 http://example.org/books/Cat2 http://example.org/books/Cat3" |
| :book1 | "1 0 0"                                                                                     |
| :book2 | "1 0 1"                                                                                     |
| :book3 | "1 1 0"                                                                                     |
--------------------------------------------------------------------------------------------------------

使用一些字符串操作,您可以缩短用于类别的 URI,或扩大数组条目以获得正确对齐。一种可能性是:

prefix : <http://example.org/books/>

select ?book (group_concat(?isCat) as ?categories) where { 
  { 
    select ?category
           (strafter(str(?category),"http://example.org/books/") as ?name)
     where { 
      ?category a :Category 
    }
    order by ?category 
  }

  { 
    bind(?name as ?isCat)
  }
  UNION 
  {
    ?book a :Book .
    # The string manipulation here takes the name of the category (which should
    # be at least two character), trims off the first character (string indexing
    # in XPath functions starts at 1), and replaces the rest with " ". The resulting
    # spaces are concatenated with "1" or "0" depending on whether the book is a
    # member of the category.  The resulting string has the same width as the
    #  category name, and makes for a nice table.
    OPTIONAL { bind( concat(replace(substr(?name,2),"."," "),"1") as ?isCat )              ?book a ?category . }
    OPTIONAL { bind( concat(replace(substr(?name,2),"."," "),"0") as ?isCat ) NOT EXISTS { ?book a ?category } }
  }
}
group by ?book
order by ?book

产生这个输出:

$ arq --data data.n3 --query matrix3.query
-----------------------------
| book   | categories       |
=============================
|        | "Cat1 Cat2 Cat3" |
| :book1 | "   1    0    0" |
| :book2 | "   1    0    1" |
| :book3 | "   1    1    0" |
-----------------------------

这几乎正​​是您在问题中所拥有的。

于 2013-07-29T16:06:26.450 回答