1

我使用 mysql Ver 8.0.3-rc for Linux on x86_64 (MySQL Community Server (GPL))

在列名上创建表和全文索引

CREATE TABLE `title` (
  `id` smallint(4) unsigned NOT NULL PRIMARY KEY,
  `name` text COLLATE utf8_unicode_ci,
  FULLTEXT idx (name) WITH PARSER ngram
) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

插入一些数据:

insert into `title` values(14,"I'm flying in for the game (one night in Niagara Falls, NY and one night in Buffalo then back home).");
insert into `title` values(23,"I've never been to the area.");
insert into `title` values(43,"Where and what must I eat (Canadian side of Niagara, American side and Buffalo)?");
insert into `title` values(125,"Don't really have much planned other than the Falls and the game.");

执行时:

select
    id,
    round(MATCH (name) AGAINST ('other than the'),2) scope
from title;

结果(一切正常):

id  | scope
----------
14  | 0.43
23  | 0.23
43  | 0.12
125 | 1.15

使用经典GROUP BY时- 一切正常

select
    max(scope),
    min(scope),
    sum(scope)
from
(
    select id, round(MATCH (name) AGAINST ('other than the'),2) scope
    from title
) a;

结果正常:

max  |  min | sum
----------------
1.15 | 0.12 | 1.96

但是当我尝试使用窗口函数时,我不明白结果:

select
    id,
    max(scope) over(),
    min(scope) over(),
    sum(scope) over()
from
(
    select id, round(MATCH (name) AGAINST ('other than the'),2) scope
    from title
) a;

我得到一个奇怪的结果(为什么?):

id | max  |  min | sum
------------------------
14 | 1.15 | 1.15 |  4.60
23 | 1.15 | 1.15 |  4.60
43 | 1.15 | 1.15 |  4.60
125| 1.15 | 1.15 |  4.60

我希望得到类似于经典组的结果,例如:

id | max  |  min | sum
------------------------
14 | 1.15 | 0.12 |  1.96
23 | 1.15 | 0.12 |  1.96
43 | 1.15 | 0.12 |  1.96
125| 1.15 | 0.12 |  1.96

这是 mysql Ver 8.0.3-rc中的错误还是我的查询不正确?谢谢!

4

2 回答 2

0

看起来您在 MySQL 中发现了一个错误,报告错误:bugs.mysql.com

我在 MySQL 和 MariaDB 中执行了以下脚本(没有,WITH PARSER ngram因为目前在 MariaDB 中不支持它,请参阅Add "ngram" support to MariaDB)结果:

MySQL:

mysql> SELECT VERSION();
+--------------+
| VERSION()    |
+--------------+
| 8.0.3-rc-log |
+--------------+
1 row in set (0.00 sec)

mysql> DROP TABLE IF EXISTS `title`;
Query OK, 0 rows affected (0.02 sec)

mysql> CREATE TABLE `title` (
    ->   `id` SMALLINT UNSIGNED NOT NULL PRIMARY KEY,
    ->   `name` TEXT COLLATE utf8_unicode_ci,
    ->   FULLTEXT idx (`name`) -- WITH PARSER ngram
    -> ) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO `title`
    -> VALUES
    ->   (14, "I'm flying in for the game (one night in Niagara Falls, NY and one night in Buffalo then back home)."),
    ->   (23, "I've never been to the area."),
    ->   (43, "Where and what must I eat (Canadian side of Niagara, American side and Buffalo)?"),
    ->   (125, "Don't really have much planned other than the Falls and the game.");
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> SELECT
    ->   MAX(`scope`),
    ->   MIN(`scope`),
    ->   SUM(`scope`)
    -> FROM
    -> (
    ->   SELECT
    ->     `id`,
    ->     ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
    ->   FROM `title`
    -> ) `a`;
+--------------+--------------+--------------+
| MAX(`scope`) | MIN(`scope`) | SUM(`scope`) |
+--------------+--------------+--------------+
|         0.72 |         0.00 |         0.72 |
+--------------+--------------+--------------+
1 row in set (0.00 sec)

mysql> SELECT
    ->   `id`,
    ->   MAX(`scope`) OVER(),
    ->   MIN(`scope`) OVER(),
    ->   SUM(`scope`) OVER()
    -> FROM
    -> (
    ->   SELECT
    ->     `id`,
    ->     ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
    ->   FROM `title`
    -> ) `a`;
+-----+---------------------+---------------------+---------------------+
| id  | MAX(`scope`) OVER() | MIN(`scope`) OVER() | SUM(`scope`) OVER() |
+-----+---------------------+---------------------+---------------------+
|  14 |                0.72 |                0.72 |                2.88 |
|  23 |                0.72 |                0.72 |                2.88 |
|  43 |                0.72 |                0.72 |                2.88 |
| 125 |                0.72 |                0.72 |                2.88 |
+-----+---------------------+---------------------+---------------------+
4 rows in set (0.00 sec)

玛丽亚数据库:

MariaDB[_]> SELECT VERSION();
+----------------------------------------+
| VERSION()                              |
+----------------------------------------+
| 10.2.6-MariaDB-10.2.6+maria~jessie-log |
+----------------------------------------+
1 row in set (0.00 sec)

MariaDB[_]> DROP TABLE IF EXISTS `title`;
Query OK, 0 rows affected (0.02 sec)

MariaDB[_]> CREATE TABLE `title` (
         ->   `id` SMALLINT UNSIGNED NOT NULL PRIMARY KEY,
         ->   `name` TEXT COLLATE utf8_unicode_ci,
         ->   FULLTEXT idx (`name`) -- WITH PARSER ngram
         -> ) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Query OK, 0 rows affected (0.01 sec)

MariaDB[_]> INSERT INTO `title`
         -> VALUES
         ->   (14, "I'm flying in for the game (one night in Niagara Falls, NY and one night in Buffalo then back home)."),
         ->   (23, "I've never been to the area."),
         ->   (43, "Where and what must I eat (Canadian side of Niagara, American side and Buffalo)?"),
         ->   (125, "Don't really have much planned other than the Falls and the game.");
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

MariaDB[_]> SELECT
         ->   MAX(`scope`),
         ->   MIN(`scope`),
         ->   SUM(`scope`)
         -> FROM
         -> (
         ->   SELECT
         ->     `id`,
         ->     ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
         ->   FROM `title`
         -> ) `a`;
+--------------+--------------+--------------+
| MAX(`scope`) | MIN(`scope`) | SUM(`scope`) |
+--------------+--------------+--------------+
|         0.72 |         0.00 |         0.72 |
+--------------+--------------+--------------+
1 row in set (0.00 sec)

MariaDB[_]> SELECT
         ->   `id`,
         ->   MAX(`scope`) OVER(),
         ->   MIN(`scope`) OVER(),
         ->   SUM(`scope`) OVER()
         -> FROM
         -> (
         ->   SELECT
         ->     `id`,
         ->     ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
         ->   FROM `title`
         -> ) `a`;
+-----+--------------+--------------+--------------+
| id  | MAX(`scope`) | MIN(`scope`) | SUM(`scope`) |
+-----+--------------+--------------+--------------+
|  14 |         0.72 |         0.00 |         0.72 |
|  23 |         0.72 |         0.00 |         0.72 |
|  43 |         0.72 |         0.00 |         0.72 |
| 125 |         0.72 |         0.00 |         0.72 |
+-----+--------------+--------------+--------------+
4 rows in set (0.00 sec)
于 2017-10-20T11:16:57.907 回答
0

关于 wchiquito 的回答:你是对的,有一个错误。自发布以来已修复。修复后,MySQL 将这个答案返回给窗口查询:

mysql> SELECT
    ->        `id`,
    ->        MAX(`scope`) OVER() `max`,
    ->        MIN(`scope`) OVER() `min`,
    ->        SUM(`scope`) OVER() `sum`
    ->      FROM
    ->      (
    ->        SELECT
    ->          `id`,
    ->          ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
    ->        FROM `title`
    ->      ) `a`;
+-----+------+------+------+
| id  | max  | min  | sum  |
+-----+------+------+------+
|  14 | 0.72 | 0.00 | 0.72 |
|  23 | 0.72 | 0.00 | 0.72 |
|  43 | 0.72 | 0.00 | 0.72 |
| 125 | 0.72 | 0.00 | 0.72 |
+-----+------+------+------+
4 rows in set (0,01 sec)

这仍然与您引用 Maria 的不同;但我相信上面的 MySQL 答案是正确的:由于窗口规范是空的,窗口函数应该作用于每一行的结果集中的所有行,即每个结果的窗口函数调用应该产生相同的值定线。

如果您对结果集进行分区,类似于对 GROUP BY 查询所做的分区(请参阅下面的 PARTITION BY a.id),您将看到以下结果:

mysql> SELECT
    ->        `id`,
    ->        MAX(`scope`) OVER(PARTITION BY a.id) `max`,
    ->        MIN(`scope`) OVER(PARTITION BY a.id) `min`,
    ->        SUM(`scope`) OVER(PARTITION BY a.id) `sum`
    ->      FROM
    ->      (
    ->        SELECT
    ->          `id`,
    ->          ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
    ->        FROM `title`
    ->      ) `a`;
+-----+------+------+------+
| id  | max  | min  | sum  |
+-----+------+------+------+
|  14 | 0.00 | 0.00 | 0.00 |
|  23 | 0.00 | 0.00 | 0.00 |
|  43 | 0.00 | 0.00 | 0.00 |
| 125 | 0.72 | 0.72 | 0.72 |
+-----+------+------+------+
4 rows in set (0,00 sec)

因为每一行在这里都是它自己的分区。这与您在没有 PARTITION BY的情况下为 Maria 引用的内容相同。

于 2017-10-20T19:35:56.953 回答