0

说我有这个:

page_url                      | canvas_url
---------------------------------------------------------------
http://www.google.com/        | http://www.google.com/barfoobaz
http://www.google.com/foo/bar | http://www.google.com/foo

我想找到按最长匹配排序的字符串开头的行。我面临的问题是找到最长的匹配字符串,而不仅仅是匹配的行也有匹配的行。IE

http://www.google.com/foo匹配page_url第 1 行和canvas_url第 2 行,但如果它是两列的长度而不是匹配,它会认为第 1 行是更好的匹配,因为canvas_url第 1 行更长。

我可以抓取所有匹配项,然后在代码中过滤长度,如下所示:

SELECT *, LENGTH(canvas_url), LENGTH(page_url)
FROM app 
WHERE
    'http://www.google.com/foo' LIKE CONCAT(canvas_url, '%') OR
    'http://www.google.com/foo' LIKE CONCAT(page_url, '%')

canvas_url或者执行 2 个子选择来获取各自的顶部匹配项page_url,然后在代码中将其过滤为 1,但我更愿意(除非有任何荒谬的性能问题)让数据库只返回我需要的内容。

我最关心的是 MySQL,但我需要以 SQLite 和 Postgress 为目标,所以我对其中任何一个的答案都很满意。

建议?

4

3 回答 3

3

这将有助于获得最长的实际匹配长度(不仅仅是记录中最长的 url):

-- Get page_url matches
SELECT *, LENGTH(page_url) AS MatchLen
FROM app 
WHERE 'http://www.google.com/foo' LIKE CONCAT(page_url, '%') -- can't tell from question if this should be reversed
UNION ALL
-- Get canvas_url matches
SELECT *, LENGTH(canvas_url) AS MatchLen
FROM app 
WHERE 'http://www.google.com/foo' LIKE CONCAT(canvas_url, '%')
-- Bring the longest matches to the top
ORDER BY MatchLen DESC -- May need to add a tie-breaker here
LIMIT 1

这是SqlFiddle 上的一个运行示例

于 2012-11-01T15:14:06.910 回答
1

也许你只需要这样的东西?

SELECT page_url as url, LENGTH(page_url) as len
FROM pages WHERE 'http://www.google.com/foo' LIKE CONCAT(page_url, '%')
UNION
SELECT canvas_url as url, LENGTH(canvas_url) as len
FROM pages WHERE 'http://www.google.com/foo' LIKE CONCAT(canvas_url, '%')
ORDER BY len DESC
LIMIT 1
于 2012-11-01T15:11:58.497 回答
0

如果您只需要查找第一行,则需要 order by 和 limit。您必须对如何安排它有点聪明:

SELECT *, LENGTH(canvas_url), LENGTH(page_url)
FROM app 
WHERE canvas_url like concat('http://www.google.com/foo' '%') OR
      page_url like concat('http://www.google.com/foo', '%')
order by (case when canvas_url like concat('http://www.google.com/foo' '%') and
                    page_url like concat('http://www.google.com/foo', '%') and
                    LENGTH(canvas_url) < LENGTH(page_url)
               then LENGTH(page_url)
               when canvas_url like concat('http://www.google.com/foo' '%') and
                    page_url like concat('http://www.google.com/foo', '%') and
                    LENGTH(canvas_url) >= LENGTH(page_url)
               when canvas_url like concat('http://www.google.com/foo' '%')
               then LENGTH(canvas_url)
               else LENGTH(page_url)
          end)
limit 1

这是按较长的匹配字符串排序,然后正好返回一行。请注意,这LIMIT不是标准的,因此不同的数据库具有不同的返回一行的机制。

于 2012-11-01T14:55:28.243 回答