4

我有一个电影数据库,其中包含有关名为 Yes, We're Open 的电影的信息。

在搜索数据库时,我遇到了一个问题,其中搜索“是的,我们是开放的”返回另一个标题,其中包含“我们是”和“开放”但在其描述中没有“是”字样,即使我要求布尔模式下的所有单词(即“是的,我们打开”'+yes +we\'re +open'在作为查询发送之前被翻译)。

我认为这是因为“是”在内置的停用词列表中。但是,当我设置ft_stopword_file = "",重新启动 mysql,然后repair table [tablename] quick我正在搜索的表时,我在搜索“是的,我们已打开”时没有得到任何结果。我在下面包含了我的 my.cnf。这是 MySQL 版本 5.0.22。有任何想法吗?

[mysqld]
query-cache-type = 1
query-cache-size = 8M
max_allowed_packet=500M
ft_min_word_len=2
ft_stopword_file = ""

[myisamchk]
ft_min_word_len=2

set-variable=local-infile=0
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
# old_passwords=1

skip-bdb

set-variable = innodb_buffer_pool_size=2M
set-variable = innodb_additional_mem_pool_size=500K
set-variable = innodb_log_buffer_size=500K
set-variable = innodb_thread_concurrency=2
[mysql.server]
user=mysql
basedir=/var/lib

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
skip-bdb

set-variable = innodb_buffer_pool_size=2M
set-variable = innodb_additional_mem_pool_size=500K
set-variable = innodb_log_buffer_size=500K
set-variable = innodb_thread_concurrency=2

编辑:以下是一些示例查询:

#1 - 内置停用词文件

mysql> SHOW VARIABLES LIKE 'ft_%';
+--------------------------+----------------+
| Variable_name            | Value          |
+--------------------------+----------------+
| ft_boolean_syntax        | + -><()~*:""&| | 
| ft_max_word_len          | 84             | 
| ft_min_word_len          | 2              | 
| ft_query_expansion_limit | 20             | 
| ft_stopword_file         | (built-in)     | 
+--------------------------+----------------+
5 rows in set (0.00 sec)

mysql> SELECT title, MATCH(title,description,genre,country) AGAINST (' +yes +we\'re +open' IN BOOLEAN MODE) as title_description_genre_country_score FROM `films` WHERE MATCH(title,description,genre,country) AGAINST (' +yes +we\'re +open' IN BOOLEAN MODE) AND `hidden` <> '1' ORDER BY `title_description_genre_country_score` DESC ;
+-----------------+---------------------------------------+
| title           | title_description_genre_country_score |
+-----------------+---------------------------------------+
| Yes, We?re Open |                                     1 | 
| Present/Future  |                                     1 | 
+-----------------+---------------------------------------+
2 rows in set (0.00 sec)

....然后编辑my.cnf,添加ft_stopword_file="".....

#2 没有停用词文件

mysql> SHOW VARIABLES LIKE 'ft_%';
+--------------------------+----------------+
| Variable_name            | Value          |
+--------------------------+----------------+
| ft_boolean_syntax        | + -><()~*:""&| | 
| ft_max_word_len          | 84             | 
| ft_min_word_len          | 2              | 
| ft_query_expansion_limit | 20             | 
| ft_stopword_file         |                | 
+--------------------------+----------------+
5 rows in set (0.00 sec)

mysql> REPAIR TABLE `films` QUICK;
+-------------------------+--------+----------+----------+
| Table                   | Op     | Msg_type | Msg_text |
+-------------------------+--------+----------+----------+
| db.films                | repair | status   | OK       | 
+-------------------------+--------+----------+----------+
1 row in set (0.14 sec)

mysql> SELECT title, MATCH(title,description,genre,country) AGAINST (' +yes +we\'re +open' IN BOOLEAN MODE) as title_description_genre_country_score FROM `films` WHERE MATCH(title,description,genre,country) AGAINST (' +yes +we\'re +open' IN BOOLEAN MODE) AND `hidden` <> '1' ORDER BY `title_description_genre_country_score` DESC ;
Empty set (0.00 sec)

编辑#2:创建表:

mysql> SHOW CREATE TABLE db.films\G;
*************************** 1. row ***************************
Table: films
Create Table: CREATE TABLE `films` (
  `id` varchar(8) NOT NULL default '',
  `title` varchar(255) default NULL,
  `hidden` tinyint(1) default '0',
  `featured` tinyint(1) default NULL,
  `type` varchar(255) default NULL,
  `subtype` varchar(255) default NULL,
  `summary` text,
  `description` text,
  `image_url` varchar(255) default NULL,
  `trailer_url` varchar(255) default NULL,
  `slug` varchar(255) default NULL,
  `category` varchar(255) default NULL,
  `parent` varchar(255) default NULL,
  `related` varchar(255) default NULL,
  `sponsor` varchar(255) default NULL,
  `genre` varchar(255) default NULL,
  `country` varchar(255) default NULL,
  `copresenters` varchar(255) default NULL,
  `original_title` varchar(255) default NULL,
  `director` varchar(255) default NULL,
  `executive_producer` varchar(255) default NULL,
  `producer` varchar(255) default NULL,
  `cinematographer` varchar(255) default NULL,
  `writer` varchar(255) default NULL,
  `editor` varchar(255) default NULL,
  `sound` varchar(255) default NULL,
  `cast` varchar(255) default NULL,
  `language` varchar(255) default NULL,
  `trt` varchar(255) default NULL,
  `year` varchar(255) default NULL,
  `subtitles` varchar(255) default NULL,
  `format` varchar(255) default NULL,
  `color` varchar(255) default NULL,
  `premiere_status` varchar(255) default NULL,
  PRIMARY KEY  (`id`),
  KEY `id` (`id`),
  KEY `type` (`type`),
  KEY `subtype` (`subtype`),
  KEY `slug` (`slug`),
  KEY `category` (`category`),
  KEY `parent` (`parent`),
  KEY `hidden` (`hidden`),
  KEY `featured` (`featured`),
  KEY `copresenters` (`copresenters`),
  KEY `original_title` (`original_title`),
  KEY `director` (`director`),
  KEY `executive_producer` (`executive_producer`),
  KEY `producer` (`producer`),
  KEY `cinematographer` (`cinematographer`),
  KEY `writer` (`writer`),
  KEY `editor` (`editor`),
  KEY `sound` (`sound`),
  KEY `cast` (`cast`),
  KEY `language` (`language`),
  KEY `trt` (`trt`),
  KEY `year` (`year`),
  KEY `subtitles` (`subtitles`),
  KEY `format` (`format`),
  KEY `color` (`color`),
  KEY `premiere_status` (`premiere_status`),
  FULLTEXT KEY `title` (`title`),
  FULLTEXT KEY `summary` (`summary`),
  FULLTEXT KEY `description` (`description`),
  FULLTEXT KEY `genre` (`genre`),
  FULLTEXT KEY `country` (`country`),
  FULLTEXT KEY `title,description` (`title`,`description`),
  FULLTEXT KEY `title,description,genre,country` (`title`,`description`,`genre`,`country`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
4

2 回答 2

2

将 stopwords.txt 变量设置为空字符串 ('') 会禁用所有停用词过滤。(删除一些你不需要的词......建议......)

您创建的全文搜索的索引需要销毁并创建新的。

更改此变量或停用词文件的内容后,必须重建 FULLTEXT 索引。

然后使用 REPAIR TABLE tbl_name QUICK。

于 2012-02-23T06:30:29.690 回答
0

InnoDB 不支持表修复(您可以在 REPAIR TABLE tbl_name QUICK 时看到注释。)

我唯一的解决方案是将引擎更改为 MyISAM,尽管这可能会降低 READ 或 WRITE 性能。

如何在 MySQL 中禁用全文停用词:

在 my.ini 文本文件 (MySQL) 中:

ft_stopword_file = ""   or link an empty file "empty_stopwords.txt"
ft_min_word_len = 2 

// 设置最小长度,但要注意较短的单词 (3,2) 会显着增加查询时间,尤其是在全文索引列字段很大的情况下。

保存文件,重启服务器。

下一步应该是使用此查询修复索引:

REPAIR TABLE tbl_name QUICK.

但是,如果您的表使用 InnoDB 存储引擎,这将不起作用。您必须将其更改为 MyISAM :

ALTER TABLE t1 ENGINE = MyISAM;

所以,再一次:

1. Edit my.ini file and save
2. Restart your server (this cannot be done dynamically)
3. Change the table engine (if needed)  ALTER TABLE tbl_name ENGINE = MyISAM;
4. Perform repair                       REPAIR TABLE tbl_name QUICK.

请注意,InnoDB 和 MyISAM 有它们的速度差异。一个读得更快,另一个写得更快(在互联网上阅读更多相关信息)

于 2013-08-02T14:36:43.440 回答