0

I've got a large database of projects and issue trackers, some of which have urls.

I'd like to query it to figure out a list of urls for each project, but many have extra data I'd like to avoid.

I'd like to do something like this:

substring(tracker_extra_field_data.field_data FROM 'http://([^/]*).*')

Except some urls are https, and I'd like to capture that as well as the first sub directory.

For example, given the url:

https://dev.foo.com/bar/action/?param=val

I'd like the select to return:

https://dev.foo.com/bar/

Is there a semi-simple way to do this with substring/regex in pgsql?

4

2 回答 2

4

尝试这个:

select substring('https://dev.foo.com/bar/action/?param=val' from '(https?://([^/]*/){1,2})');

template1=# select substring('https://dev.foo.com/bar/action/?param=val' from '(https?://([^/]*/){1,2})');
        substring
-------------------------
 https://dev.foo.com/bar/
(1 row)

template1=# select substring('http://dev.foo.com/bar/action/?param=val' from '(https?://([^/]*/){1,2})');
       substring
------------------------
 http://dev.foo.com/bar/
于 2013-07-19T16:41:28.720 回答
0

在我一开始没有正确阅读 Q 之后更新。

使用模式

^https?://[^/]+(?:/[^/]+)?/?

^.. 字符串开头
?.. 零个或一个原子
(?:).. 非捕获括号.. 除, 1 个或多个之外的
[^/]+任何字符/

http://这仅接受以or开头的 URL https://(需要协议标头)。

-> SQLfiddle 有一个更大的测试用例。

于 2013-07-19T17:01:33.393 回答