full-text-search - 使用 Word1 和 NOT Word2 进行 XQuery 全文搜索

Question

以下是 XML 结构 -

<Docs>
  <Doc>
    <Name>Doc 1</Name>
    <Notes>
        <specialNote>
          This is a special note section. 
           <B>This B Tag is used for highlighting any text and is optional</B>        
           <U>This U Tag will underline any text and is optional</U>        
           <I>This I Tag is used for highlighting any text and is optional</I>        
        </specialNote>      
        <generalNote>
           <P>
            This will store the general notes and might have number of paragraphs. This is para no 1. NO Child Tags here         
           </P>
           <P>
            This is para no 2            
           </P>  
        </generalNote>      
    </Notes>  
    <Desc>
        <P>
          This is used for Description and might have number of paragraphs. Here too, there will be B, U and I Tags for highlighting the description text and are optional
          <B>Bold</B>
          <I>Italic</I>
          <U>Underline</U>
        </P>
        <P>
          This is description para no 2 with I and U Tags
          <I>Italic</I>
          <U>Underline</U>
        </P>      
    </Desc>
</Doc>

将有 1000 个Doc标签。我想给用户一个搜索条件，他可以搜索WORD1而不是WORD2。以下是查询 -

for $x in doc('Documents')/Docs/Doc[Notes/specialNote/text() contains text 'Tom' 
ftand  ftnot 'jerry' or 
Notes/specialNote/text() contains text 'Tom' ftand ftnot 'jerry' or 
Notes/specialNote/B/text() contains text 'Tom' ftand ftnot 'jerry' or 
Notes/specialNote/I/text() contains text 'Tom' ftand ftnot 'jerry' or 
Notes/specialNote/U/text() contains text 'Tom' ftand ftnot 'jerry' or
Notes/generalNote/P/text() contains text 'Tom' ftand ftnot 'jerry' or 
Desc/P/text() contains text 'Tom' ftand ftnot 'jerry' or 
Desc/P/B/text() contains text 'Tom' ftand ftnot 'jerry' or 
Desc/P/I/text() contains text 'Tom' ftand ftnot 'jerry' or 
Desc/P/U/text() contains text 'Tom' ftand ftnot 'jerry']
return $x/Name

这个查询的结果是错误的。我的意思是，结果包含一些带有Tom和的文档jerry。所以我将查询更改为 -

for $x in doc('Documents')/Docs/Doc[. contains text 'Tom' ftand ftnot 'jerry'] 
return $x/Name

这个查询给了我确切的结果，即；只有那些带有Tomand Not的文档jerry，但需要花费大量时间......大约。45 秒，而前一个花了 10 秒！

我正在使用 BaseX 7.5 XML 数据库。

需要专家对此的评论:)

score 4 · Accepted Answer

第一个查询分别测试文档中的每个文本节点，因此Tom and Jerry会匹配，因为第一个文本节点包含Tom但不包含Jerry。

在第二个查询中，对元素的所有文本内容执行全文搜索，就Doc好像它们被连接成一个字符串一样。BaseX 的全文索引（目前）不能回答这个问题，它分别索引每个文本节点。

一种解决方案是分别对每个术语执行全文搜索并最终合并结果。这可以对每个文本节点分别进行，因此可以使用索引：

for $x in (doc('Documents')/Docs/Doc[.//text() contains text 'Tom']
            except doc('Documents')/Docs/Doc[.//text() contains text 'Jerry'])
return $x/Name

上面的查询被查询优化器重写为这个等价的查询，使用两个索引访问：

for $x in (db:fulltext("Documents", "Tom")/ancestor::*:Doc
            except db:fulltext("Documents", "Jerry")/ancestor::*:Doc)
return $x/Name

如果需要，您甚至可以调整合并结果的顺序，以使中间结果保持较小。

full-text-search - 使用 Word1 和 NOT Word2 进行 XQuery 全文搜索

1 回答 1

Related

Reference