-1

我用 CentOs 6 在我的流浪机器上安装了 sphinx,我正在尝试从 Snowball 安装荷兰语 libstemmer。安装已成功执行,但测试出错。

我创建了 2 个具有完全相同数据的索引。我的索引是:

index shop_products1 {
  type = rt
  dict = keywords
  min_prefix_len = 3
  rt_mem_limit = 2046M

  path = /var/lib/sphinxsearch/data/shop_products2

  morphology = libstemmer_nl, stem_en
  
  html_strip = 1
  html_index_attrs = img=alt,title; a=title;

  preopen = 1
  inplace_enable = 1
  index_exact_words = 1

  
  rt_field = name
  rt_field = brand
  rt_field = description
  rt_field = specifications
  rt_field = tags
  rt_field = ourtags
  rt_field = searchfield
  rt_field = shop
  rt_field = category
  
  rt_field = color
  rt_field = ourcolor
  rt_field = gender
  rt_field = material

  rt_field = ean
  rt_field = sku

  rt_attr_string = ean
  rt_attr_string = sku
  rt_attr_float = price
  rt_attr_float = discount
  rt_attr_uint = shopid
  rt_attr_uint = itemid
  rt_attr_uint = deleted
  rt_attr_uint = duplicate
  rt_attr_uint = brandid
  rt_attr_uint = duplicates
  rt_attr_timestamp = updated_at
}

index shop_products2 {
  type = rt
  dict = keywords
  min_prefix_len = 3
  rt_mem_limit = 2046M

  path = /var/lib/sphinxsearch/data/shop_products20

  html_strip = 1
  html_index_attrs = img=alt,title; a=title;

  preopen = 1
  inplace_enable = 1
  index_exact_words = 1

  
  rt_field = name
  rt_field = brand
  rt_field = description
  rt_field = specifications
  rt_field = tags
  rt_field = ourtags
  rt_field = searchfield
  rt_field = shop
  rt_field = category
  
  rt_field = color
  rt_field = ourcolor
  rt_field = gender
  rt_field = material

  rt_field = ean
  rt_field = sku

  rt_attr_string = ean
  rt_attr_string = sku
  rt_attr_float = price
  rt_attr_float = discount
  rt_attr_uint = shopid
  rt_attr_uint = itemid
  rt_attr_uint = deleted
  rt_attr_uint = duplicate
  rt_attr_uint = brandid
  rt_attr_uint = duplicates
  rt_attr_timestamp = updated_at
}




searchd {
	listen = 127.0.0.1:9306:mysql41
  log = /var/log/sphinxsearch/searchd.log
  workers = threads
  binlog_path = /var/lib/sphinxsearch/rt-binlog

  read_timeout = 5
  client_timeout = 200
  max_children = 0
  	
  # 2 hours
  rt_flush_period = 7200
  pid_file = /var/run/searchd.pid
  
}

当我搜索例如荷兰语单词“afzuigkappen”时,它必须给出与“afzuigkap”完全相同的结果

有人可以给我一些有关如何获得这项工作的信息吗?附言。对不起,我的英语不好..

4

2 回答 2

0

雪球茎中的荷兰语词干分析器afzuigkappenafzuigkap不同之处:

afzuigkappen  -> afzuigkapp
afzuigkap -> afzuigkap

所以你应该更新词干算法以达到你的目标,关于算法的文档在 这里

于 2015-09-04T14:42:16.780 回答
0

好的,我已经创建了一些特定的测试。我创建的索引:

index test1 {
  type = rt
  dict = keywords
  min_prefix_len = 3
  rt_mem_limit = 2046M

  morphology = libstemmer_nl, stem_en

  path = /var/lib/sphinxsearch/data/test1

  preopen = 1
  inplace_enable = 1
  index_exact_words = 1

  rt_field = name
  rt_attr_uint = shopid
  rt_attr_uint = itemid
    
}

index test2 {
  type = rt
  dict = keywords
  min_prefix_len = 3
  rt_mem_limit = 2046M

  path = /var/lib/sphinxsearch/data/test2

  preopen = 1
  inplace_enable = 1
  index_exact_words = 1

  rt_field = name
  rt_attr_uint = shopid
  rt_attr_uint = itemid
    
}

我用一个较小的包含足球产品的数据库进行索引,并用 sphinx 作为结果进行搜索:http: //imgur.com/n95Ue8v

如您所见,两者都给出了相同的输出,有 53 条记录。如果我只在我的 mysql 中搜索: select * from tests1 WHERE name LIKE '%keeper%' 我得到 360 个结果。

于 2015-09-07T08:15:50.740 回答