sql - 如何在 SQL 中排序，忽略文章（“the”、“a”、“an”等）

Question

这出现了很多，我可以看到它出现在 StackOverflow 上的XSLT、Ruby和Drupal但我没有看到它专门用于 SQL。

所以问题是，当标题以“The”、“A”或“An”开头时，如何正确排序？

一种方法是简单地 TRIM() 这些字符串：

ORDER BY TRIM( 
  LEADING 'a ' FROM 
  TRIM( 
    LEADING 'an ' FROM 
    TRIM( 
      LEADING 'the ' FROM LOWER( title ) 
      ) 
    ) 
  )

不久前在 AskMeFi 上提出了这个建议（它需要那个LOWER()功能吗？）。

我知道我也看到了某种 Case/Switch 的实现，但这对谷歌来说有点困难。

显然有许多可能的解决方案。有什么好处是 SQL 大师权衡哪些对性能有影响。

score 7 · Accepted Answer

我见过的一种方法是有两列——一列用于显示，另一列用于排序：

description  |  sort_desc
----------------------------
The the      | the, The
A test         | test, A
I, Robot      | i, Robot

我还没有进行任何真实世界的测试，但这具有能够使用索引的好处，并且每次您想按描述订购时都不需要进行字符串操作。除非您的数据库支持物化视图（MySQL 不支持），否则将逻辑实现为视图中的计算列不会提供任何好处，因为您无法索引计算列。

score 4 · Accepted Answer

多年来我一直在使用它，但不记得我在哪里找到它：

SELECT 
CASE
    WHEN SUBSTRING_INDEX(Title, ' ', 1) IN ('a', 'an', 'the') 
    THEN CONCAT( SUBSTRING( Title, INSTR(Title, ' ') + 1 ), ', ', SUBSTRING_INDEX(Title, ' ', 1) ) 
    ELSE Title 
    END AS TitleSort,
Title AS OriginalTitle 
FROM yourtable 
ORDER BY TitleSort

产量：

TitleSort                  | OriginalTitle
------------------------------------------------------
All About Everything       | All About Everything
Beginning Of The End, The  | The Beginning Of The End
Interesting Story, An      | An Interesting Story
Very Long Story, A         | A Very Long Story

score 2 · Accepted Answer

特别是对于 Postgres，您可以使用 regexp_replace 为您完成工作：

BEGIN;
CREATE TEMPORARY TABLE book (name VARCHAR NOT NULL) ON COMMIT DROP;
INSERT INTO book (name) VALUES ('The Hitchhiker’s Guide to the Galaxy');
INSERT INTO book (name) VALUES ('The Restaurant at the End of the Universe');
INSERT INTO book (name) VALUES ('Life, the Universe and Everything');
INSERT INTO book (name) VALUES ('So Long, and Thanks for All the Fish');
INSERT INTO book (name) VALUES ('Mostly Harmless');
INSERT INTO book (name) VALUES ('A book by Douglas Adams');
INSERT INTO book (name) VALUES ('Another book by Douglas Adams');
INSERT INTO book (name) VALUES ('An omnibus of books by Douglas Adams');

SELECT name FROM book ORDER BY name;
SELECT name, regexp_replace(lower(name), '^(an?|the) (.*)$', '\2, \1') FROM book ORDER BY 2;
SELECT name FROM book ORDER BY regexp_replace(lower(name), '^(an?|the) (.*)$', '\2, \1');
COMMIT;

score 0 · Accepted Answer

我只能代表 SQL Server：您在 CASE 语句中使用 LTRIM。不需要 LOWER 函数，因为默认情况下选择不区分大小写。但是，如果您想忽略文章，那么我建议您使用干扰词词典并设置全文索引目录。我不确定其他实现是否 SQL 支持这一点。

score -1 · Accepted Answer

-1

LOWER是需要的。WhileSELECT不区分大小写，ORDER BYis。

于 2010-12-11T03:12:35.813 回答

score -3 · Accepted Answer

尝试以下操作：

ORDER BY replace(replace(replace(YOURCOLUMN,'THE',''),'a\'',''),'an','')

未测试！

sql - 如何在 SQL 中排序，忽略文章（“the”、“a”、“an”等）

6 回答 6

Related

Reference