1

Mockup code for my problem:

SELECT Id FROM Tags WHERE TagName IN '<osx><keyboard><security><screen-lock>'

The problem in detail

I am trying to get tags used in 2011 from apple.stackexchange data. (this query)

As you can see, tags in tag changes are stored as plain text in the Text field. example output with stackexchange tags

<tag1><tag2><tag3>
<osx><keyboard><security><screen-lock>

How can I create a unique list of the tags, to look them up in the Tags table, instead of this hardcoded version:

SELECT * FROM Tags
  WHERE TagName = 'osx' 
     OR TagName = 'keyboard' 
     OR TagName = 'security'

Here is a interactive example.

Stackexchange uses T-SQL, my local copy is running under postgresql using Postgres app version 9.4.5.0.

4

2 回答 2

1

我仅将数据简化为相关列,并调用它tags来展示示例。

样本数据

create table posthistory(tags text);
insert into posthistory values
  ('<lion><backup><time-machine>'),
  ('<spotlight><alfred><photo-booth>'),
  ('<lion><pdf><preview>'),
  ('<pdf>'),
  ('<asd>');

查询以获取唯一的标签列表

SELECT DISTINCT
  unnest(
    regexp_split_to_array(
      trim('><' from tags), '><'
    )
  )
FROM
  posthistory

首先,我们从每一行中删除所有出现的前导和尾随>和符号,然后使用函数将值放入数组中,然后将数组扩展为一组行。最后消除重复值。<regexp_split_to_array()unnest()DISTINCT

演示SQLFiddle以预览其工作原理。

于 2015-12-28T17:21:02.043 回答
1

假设这个表定义:

CREATE TABLE posthistory(post_id int PRIMARY KEY, tags text);

取决于你到底想要什么:

要将字符串转换为数组,请修剪前导和尾随 '<>',然后将 '><' 视为分隔符:

SELECT *, string_to_array(trim(tags, '><'), '><') AS tag_arr
FROM   posthistory;

要获取整个表的唯一标签列表(我猜你想要这个):

SELECT DISTINCT tag
FROM   posthistory, unnest(string_to_array(trim(tags, '><'), '><')) tag;

隐式LATERAL连接需要 Postgres 9.3 或更高版本。

这应该比使用正则表达式快得多。如果您想尝试正则表达式,请使用regexp_split_to_table()而不是regexp_split_to_array()后跟unnest()另一个答案中建议的类似:

SELECT DISTINCT tag
FROM   posthistory, regexp_split_to_table(trim(tags, '><'), '><') tag;

还有隐式LATERAL连接。有关的:

搜索特定标签:

SELECT *
FROM   posthistory
WHERE  tags LIKE '%<security>%'
AND    tags LIKE '%<osx>%';

SQL小提琴。

在我们的数据浏览器中应用于您在 T-SQL 中的搜索:

SELECT TOP 100
       PostId, UserId, Text AS Tags FROM PostHistory
WHERE  year(CreationDate) = 2011
AND    PostHistoryTypeId IN (3  -- initial tags
                           , 6  -- edit tags
                           , 9) -- rollback tags
AND    Text LIKE ('%<' + ##TagName:String?postgresql## + '>%');

(T-SQL 语法使用非标准+而不是||。)
https://data.stackexchange.com/apple/query/edit/417055

于 2015-12-29T06:11:30.513 回答