0

我正在尝试查看是否有完整的 solr 语法错误列表。我的目标是创建一个“清理”前端用户查询的函数,这样它就不会导致语法错误。

到目前为止,我发现了两个错误:

EOF

如果查询以大写的 AND、OR、NOT 等结尾,则会引发 EOF 错误。
修复:小写查询(因为查询设置为不区分大小写)

不明字段信息

如果查询包含冒号,如“长学术标题的开始:此处诙谐的副标题”。
修复::用空格替换所有实例。

我希望这是我需要解决的所有问题,但是如果还有其他我应该注意和控制的 solr 语法错误,那将非常有用!

4

1 回答 1

0

我不确定是否有任何语法错误的完整列表,但这里有一些,我们已经处理了:

1) encoding issues: special characters like %, & etc should not be
passed as it is as they may ruin the whole query

2) cases of two asterisks together: ** may cause infinite loops or
put the system down to its knees, if leading and trailing wildcards
are accepted. Case when a search term is just one asterisk isn't
allowed in our system either

3) (optionally) for boolean queries ensure that opening and closing
brackets match

4) strip the punctuation, but do it with care, e.g. if U.S. turns
into US, then to ensure findability (recall matters to us), we make
sure same happens during the tokenization. Also we identify urls and
don't remove punctuation from them

5) some errors may relate to malformed proximity operators (like
near, ~), e.g. we don't allow them to be nested or boolean operators
inside them

我还要说,可以根据您为用户定义的语法来控制一些语法错误。那就是不要让他们做你不想让他们做的事。这也会在您的用户和您的应用程序之间形成某种搜索合同。提供一些类似工具提示的信息也很好,这些信息将告诉用户可以将什么典型语法用于什么目的。

于 2012-08-31T19:22:41.003 回答