2

我有两个包含以下数据的 PostgreSQL 表:

房屋

-# select * from houses;
 id |    address
----+----------------
  1 | 123 Main Ave.
  2 | 456 Elm St.
  3 | 789 County Rd.
(3 rows)

-# select * from people;
 id | name  | house_id
----+-------+----------
  1 | Fred  |        1
  2 | Jane  |        1
  3 | Bob   |        1
  4 | Mary  |        2
  5 | John  |        2
  6 | Susan |        2
  7 | Bill  |        3
  8 | Nancy |        3
  9 | Adam  |        3
(9 rows)

在 Spoon 我有两个表输入,第一个用 SQL命名为House Input :

SELECT
  id
, address
FROM houses
ORDER BY id;

第二个表输入名为People Input,其 SQL 如下:

SELECT
  "name"
, house_id
FROM people
ORDER BY house_id;

我有两个表输入进入合并连接,它使用House Input作为第一步,键为id,第二步为People Input,键为house_id

然后,我将其与数据库演示、收藏和 Mongo 文档字段和. (正如我期望 MongoDB 分配的那样)。addressname_id

当我db.houses.find();从 Mongo shell 运行转换和输入时,我得到:

{ "_id" : ObjectId("52083706b251cc4be9813153"), "address" : "123 Main Ave.", "name" : "Fred" }
{ "_id" : ObjectId("52083706b251cc4be9813154"), "address" : "123 Main Ave.", "name" : "Jane" }
{ "_id" : ObjectId("52083706b251cc4be9813155"), "address" : "123 Main Ave.", "name" : "Bob" }
{ "_id" : ObjectId("52083706b251cc4be9813156"), "address" : "456 Elm St.", "name" : "Mary" }
{ "_id" : ObjectId("52083706b251cc4be9813157"), "address" : "456 Elm St.", "name" : "John" }
{ "_id" : ObjectId("52083706b251cc4be9813158"), "address" : "456 Elm St.", "name" : "Susan" }
{ "_id" : ObjectId("52083706b251cc4be9813159"), "address" : "789 County Rd.", "name" : "Bill" }
{ "_id" : ObjectId("52083706b251cc4be981315a"), "address" : "789 County Rd.", "name" : "Nancy" }
{ "_id" : ObjectId("52083706b251cc4be981315b"), "address" : "789 County Rd.", "name" : "Adam" }

想要得到的是这样的:

{ "_id" : ObjectId("52083706b251cc4be9813153"), "address" : "123 Main Ave.", "people" : [
        { "_id" : ObjectId("52083706b251cc4be9813154"), "name" : "Fred"} ,
        { "_id" : ObjectId("52083706b251cc4be9813155"), "name" : "Jane" } ,
        { "_id" : ObjectId("52083706b251cc4be9813155"), "name" : "Bob" }
    ]  
},
{ "_id" : ObjectId("52083706b251cc4be9813156"), "address" : "345 Elm St.", "people" : [
        { "_id" : ObjectId("52083706b251cc4be9813157"), "name" : "Mary"} ,
        { "_id" : ObjectId("52083706b251cc4be9813158"), "name" : "John" } ,
        { "_id" : ObjectId("52083706b251cc4be9813159"), "name" : "Susan" }
    ]  
},
{ "_id" : ObjectId("52083706b251cc4be981315a"), "address" : "789 County Rd.", "people" : [
        { "_id" : ObjectId("52083706b251cc4be981315b"), "name" : "Mary"} ,
        { "_id" : ObjectId("52083706b251cc4be981315c"), "name" : "John" } ,
        { "_id" : ObjectId("52083706b251cc4be981315d"), "name" : "Susan" }
     ]
 }

}

我知道为什么我得到了我所得到的,但似乎无法在网上或示例中找到任何东西来让我到达我想去的地方。

我希望有人能把我推向正确的方向,指出一个更接近我想要完成的例子,或者告诉我这超出了 Kettle 应该做的范围(希望不是后者)。

4

1 回答 1

0

原来创建子表都在MongoDB 输出步骤中。

首先确保您在配置连接选项卡上选中了Upsert修改器更新。

然后在Mongo Documents 字段选项卡上输入以下内容(第一行是列名):

Name    | Mongo document Path | Use field name | Match field for upsert | Modifier operation | Modifier policy
--------+---------------------+----------------+------------------------|--------------------+---------------- 
address |                     | Y              | N                      | N/A                | Insert
address |                     | Y              | Y                      | N/A                | Insert
name    | people[0]           | Y              | N                      | $set               | Insert
name    | people[1]           | Y              | N                      | $push              | Update

现在,当我跑步时,db.houses.find();我得到:

{ "_id" : ObjectId("520ccb8978d96b204daa029d"), "address" : "123 Main Ave.", "people" : [ { "name" : "Fred" }, { "name" : "Jane" }, { "name" : "Bob" } ] }
{ "_id" : ObjectId("520ccb8978d96b204daa029e"), "address" : "456 Elm St.", "people" : [ { "name" : "Mary" }, { "name" : "John" }, { "name" : "Susan" } ] }
{ "_id" : ObjectId("520ccb8a78d96b204daa029f"), "address" : "789 County Rd.", "people" : [ { "name" : "Bill" }, { "name" : "Nancy" }, { "name" : "Adam" } ] }

我想注意两点:

  1. 这假设我的地址是唯一的,并且我的名字在一个房子中是唯一的。如果不是这种情况,我需要将我的 OLTP 表中的 id 设置为 MongoDB 中的 id(不是 _id)字段,并匹配我家 id 上的字段 upsert。
  2. 正如上面@G Gordon Worley III 所指出的,如果这两个表在同一个数据库中,我可以在表输出步骤中进行连接,这将是一个两步转换(并且更快)。
于 2013-08-15T12:45:19.997 回答