2

我想保留令牌中的特殊字符,同时仍然标记特殊字符。说我有这个词

"H&R Blocks"

我想将它标记为

"H", "R", "H&R", "Blocks"

我读了这篇文章http://www.fullscale.co/blog/2013/03/04/preserving_specific_characters_during_tokenizing_in_elasticsearch.html。它解释了如何保留特殊字符。

4

2 回答 2

4

尝试使用word_delimiter令牌过滤器。

阅读有关其使用的文档,您可以设置参数preserve_original: true以完全按照您的意愿行事(即“H&R”=> H&R H R)。

我会这样设置:

"settings" : {
    "analysis" : {
        "filter" : {
            "special_character_spliter" : {
                "type" : "word_delimiter",
                "preserve_original": "true"
            }   
        },
        "analyzer" : {
            "your_analyzer" : {
                "type" : "custom",
                "tokenizer" : "whitespace",
                "filter" : ["lowercase", "special_character_spliter"]
            }
        }
    }
}

祝你好运!

于 2013-08-15T10:27:32.823 回答
0
"settings" : { 
   "analysis" : {
       "filter" : {
           "blocks_filter" : {
               "type" : "word_delimiter",
               "preserve_original": "true"
           },
          "shingle":{
              "type":"shingle",
              "max_shingle_size":5,
              "min_shingle_size":2,
              "output_unigrams":"true"
           },
           "filter_stop":{
              "type":"stop",
              "enable_position_increments":"false"
           }
       },
       "analyzer" : {
           "blocks_analyzer" : {
               "type" : "custom",
               "tokenizer" : "whitespace",
               "filter" : ["lowercase", "blocks_filter", "shingle"]
           }
       }
   }
},
"mappings" : {
   "type" : {
       "properties" : {
           "company" : {
               "type" : "string",
               "analyzer" : "blocks_analyzer"
           }
       }
   }
}
于 2013-08-14T17:54:19.093 回答