0

我正在编写查看 XML 文件并获取目标词的代码。然后它寻找一个后继词并计算这两个词在所有文档中一起出现的概率。当我尝试 normalize-space() 时,$successor 的输出结果仍然在单词后显示一个空格。下面是我的代码和我得到的输出文件。

代码:

<html>
<body>
<table border='1'>
<tr><td>Target</td><td>Successor</td><td>Probability</td></tr>
{
let $targetword := "has"
let $t_word_occ := collection("./?select=*xml")//s//w[lower-case(normalize-space()) = $targetword] (::)
let $totalwords := collection("./?select=*xml")//s//w[lower-case(normalize-space())]
for $successor in distinct-values($t_word_occ/following-sibling::w[1])
    let $freq := count($t_word_occ/following-sibling::w[1][. = $successor])
        let $dwtw := count($totalwords[. = $successor])
let $prob := $freq div $dwtw
order by ($prob) descending
return <tr><td>{$targetword}</td><td>{$successor}</td><td>{$prob}</td>
       </tr>
}
</table>
</body>
</html>

样本输出:

 <tr>
            <td>Target</td>
            <td>Successor</td>
            <td>Probability</td>
         </tr>
         <tr>
            <td>has</td>
            <td>intentions </td>
            <td>1</td>
         </tr>
         <tr>
            <td>has</td>
            <td>drifted </td>
            <td>1</td>
         </tr>
         <tr>
            <td>has</td>
            <td>eluded </td>
            <td>1</td>
         </tr>
         <tr>
            <td>has</td>
            <td>won</td>
            <td>1</td>
         </tr>

在输出中,您可以看到一些单词,例如“drifted”、“eluded”以及后面的空格。还有一个是正常的,例如“赢了”(没有空格)

我将如何解决这个问题?

我也在使用 xQuery 1.0

4

1 回答 1

0

您可以尝试以下技术:

<td>{$successor cast as xs:token?}</td>

for $successor in distinct-values($t_word_occ/(following-sibling::w[1] cast as xs:token?))

甚至如下

for $successor in distinct-values($t_word_occ/xs:token(following-sibling::w[1]))

完整的复制品

xquery version "1.0";

declare context item := document {
<root>
     <column id="1" isok="true">OK</column>
     <column id="2" isok="false">NOT OK</column>
     <column id="3" isok="   TRUE   ">OK</column>
     <column id="4" isok=" false">NOT OK</column>
 </root>
};

<root>
{
  (: two versions, both working :)
  (: for $x in distinct-values(./root/column/(@isok cast as xs:token?)) :)
  for $x in distinct-values(./root/column/xs:token(@isok))
  return <r>{$x}</r>
}
</root>
于 2020-12-10T16:13:59.690 回答