postgresql - 根据默认顺序加入2组

Question

如何仅根据默认顺序加入 2 组记录？

所以如果我有一个表 x(col(1,2,3,4,5,6,7)) 和另一个表 z(col(a,b,c,d,e,f,g))

它会回来

c1 c2
-- --    
1   a
2   b
3   c
4   d
5   e
6   f
7   g

实际上，我想从参数中加入一对一维数组，并将它们视为表中的列。

示例代码：

CREATE OR REPLACE FUNCTION "Test"(timestamp without time zone[],
                                  timestamp without time zone[])
  RETURNS refcursor AS
$BODY$
DECLARE
curr refcursor;
BEGIN
    OPEN curr FOR 
        SELECT DISTINCT "Start" AS x, "End" AS y, COUNT("A"."id") 
        FROM UNNEST($1) "Start" 
        INNER JOIN 
        (
            SELECT "End", ROW_NUMBER() OVER(ORDER BY ("End")) rn
                FROM UNNEST($2) "End" ORDER BY ("End") 
        ) "End" ON ROW_NUMBER() OVER(ORDER BY ("Start")) = "End".rn 
        LEFT JOIN "A" ON ("A"."date" BETWEEN x AND y) 
        GROUP BY 1,2 
        ORDER BY "Start";
    return curr;
END

$BODY$

score 3 · Accepted Answer

现在，要回答评论中揭示的真正问题，这似乎是：

给定两个数组“a”和“b”，我如何将它们的元素配对，以便我可以将元素对作为查询中的列别名？

有几种方法可以解决这个问题：

当且仅当数组长度相等时，unnest在SELECT子句中使用多个函数（一种已弃用的方法，仅应用于向后兼容）；
用于generate_subscripts遍历数组；
如果您需要支持太旧而无法拥有的版本，请使用generate_series过度子查询来模拟array_lower和模拟；array_uppergenerate_subscriptsgenerate_subscripts
依靠unnest返回元组的顺序并希望 - 就像我的其他答案一样，如下所示。它会工作，但不能保证在未来的版本中工作。
使用PostgreSQL 9.4 中添加的WITH ORDINALITY功能（另请参阅它的第一篇文章）来获取unnest9.4 何时发布的行号。
使用 multiple-array UNNEST，这是 SQL 标准但PostgreSQL 还不支持。

所以，假设我们有arraypair带有数组参数的函数a和b：

CREATE OR REPLACE FUNCTION arraypair (a integer[], b text[]) 
RETURNS TABLE (col_a integer, col_b text) AS $$
  -- blah code here blah
$$ LANGUAGE whatever IMMUTABLE;

它被调用为：

SELECT * FROM arraypair( ARRAY[1,2,3,4,5,6,7], ARRAY['a','b','c','d','e','f','g'] );

可能的函数定义是：

SRF-in- `SELECT`（已弃用）

CREATE OR REPLACE FUNCTION arraypair (a integer[], b text[])
RETURNS TABLE (col_a integer, col_b text) AS $$
    SELECT unnest(a), unnest(b);
$$ LANGUAGE sql IMMUTABLE;

如果数组长度不相等，将产生奇怪和意想不到的结果；请参阅列表中有关集合返回函数及其非标准用法的文档，SELECT以了解原因以及究竟发生了什么。

`generate_subscripts`

这可能是最安全的选择：

CREATE OR REPLACE FUNCTION arraypair (a integer[], b text[])
RETURNS TABLE (col_a integer, col_b text) AS $$
    SELECT
       a[i], b[i]
    FROM generate_subscripts(CASE WHEN array_length(a,1) >= array_length(b,1) THEN a::text[] ELSE b::text[] END, 1) i;
$$ LANGUAGE sql IMMUTABLE;

如果数组的长度不相等，如所写，它将返回较短的空元素，因此它的工作方式类似于完全外连接。反转大小写的意义以获得类似内部连接的效果。该函数假定数组是一维的，并且它们从索引 1 开始。如果整个数组参数为 NULL，则该函数返回 NULL。

更通用的版本将用 PL/PgSQL 编写，并会检查array_ndims(a) = 1、检查array_lower(a, 1) = 1、测试空数组等。我将把它留给你。

希望成对回报：

这不能保证有效，但对 PostgreSQL 的当前查询执行器有效：

CREATE OR REPLACE FUNCTION arraypair (a integer[], b text[])
RETURNS TABLE (col_a integer, col_b text) AS $$
 WITH
    rn_c1(rn, col) AS (
      SELECT row_number() OVER (), c1.col
      FROM unnest(a) c1(col) 
    ),
    rn_c2(rn, col) AS (
      SELECT row_number() OVER (), c2.col
      FROM unnest(b) c2(col)
    )
    SELECT
      rn_c1.col AS c1, 
      rn_c2.col AS c2
    FROM rn_c1 
    INNER JOIN rn_c2 ON (rn_c1.rn = rn_c2.rn);
$$ LANGUAGE sql IMMUTABLE;

我会考虑使用generate_subscripts更安全。

多参数`unnest`：

这应该可以工作，但不能，因为 PostgreSQLunnest不接受多个输入数组（还）：

SELECT * FROM unnest(a,b);

score 1 · Accepted Answer

select x.c1, z.c2
from
    x
    inner join
    (
        select
            c2,
            row_number() over(order by c2) rn
        from z
        order by c2
    ) z on x.c1 = z.rn
order by x.c1

如果x.c1不是1,2,3...，你可以做同样的事情z

正如欧文所指出的那样，中间order by不是必需的。我是这样测试的：

create table t (i integer);
insert into t
select ceil(random() * 100000)
from generate_series(1, 100000);

select
    i,
    row_number() over(order by i) rn
from t
;

i出来订购。在这个我从未执行过的简单测试之前，我认为行可能会以任何顺序编号。

score 0 · Accepted Answer

By "default order" it sounds like you probably mean the order in which the rows are returned by select * from tablename without an ORDER BY.

If so, this ordering is undefined. The database can return rows in any order that it feels like. You'll find that if you UPDATE a row, it probably moves to a different position in the table.

If you're stuck in a situation where you assumed tables had an order and they don't, you can as a recovery option add a row number based on the on-disk ordering of the tuples within the table:

select row_number() OVER (), *
from the_table
order by ctid

If the output looks right, I recommend that you CREATE TABLE a new table with an extra field, then do an INSERT INTO ... SELECT to insert the data ordered by ctid, then ALTER TABLE ... RENAME the tables and finally fix any foreign key references so they point to the new table.

ctid can be changed by autovacuum, UPDATE, CLUSTER, etc, so it is not something you should ever be using in applications. I'm using it here only because it sounds like you don't have any real ordering or identifier key.

If you need to pair up rows based on their on-disk ordering (an unreliable and unsafe thing to do as noted above), you could per this SQLFiddle try:

WITH
rn_c1(rn, col) AS (
  SELECT row_number() OVER (ORDER BY ctid), c1.col
  FROM c1 
),
rn_c2(rn, col) AS (
  SELECT row_number() OVER (ORDER BY ctid), c2.col
  FROM c2
)
SELECT
  rn_c1.col AS c1, 
  rn_c2.col AS c2
FROM rn_c1 
INNER JOIN rn_c2 ON (rn_c1.rn = rn_c2.rn);

but never rely on this in a production app. If you're really stuck you can use this with CREATE TABLE AS to construct a new table that you can start with when you're working on recovering data from a DB that lacks a required key, but that's about it.

The same approach given above might work with an empty window clause () instead of (ORDER BY ctid) when using sets that lack a ctid, like interim results from functions. It's even less safe then though, and should be a matter of last resort only.

(See also this newer related answer: https://stackoverflow.com/a/17762282/398670)

postgresql - 根据默认顺序加入2组

3 回答 3

SRF-in- SELECT（已弃用）

generate_subscripts

希望成对回报：

多参数unnest：

Related

Reference

SRF-in- `SELECT`（已弃用）

`generate_subscripts`

多参数`unnest`：