sql - 如何优化代码以选择我想在 Hadoop Hue 中提取的行并从列中连接文本？

翻译自：https://stackoverflow.com/questions/62898426 2020-07-14T15:17:38.347

32 次

我在 Hue 上使用 Hadoop，可以下载的行数限制为 100000 行。我想选择要下载的行以下载整个基础。示例：第 1 行到 100000、100001 到 200000 ...

问题 1：我正在使用代码，但是带来结果的时间太长，并且他们服务器上的连接时间最终会下降，我想知道如何优化此代码。我是 SQL 新手。

问题2：在底部的一列是一个文本字段，只有文本被行分隔。示例：第 1 行 - id1 - 单词 1，第 2 行 - id 1 - 单词 2，第 3 行 - id 1 - 单词 3。为了减少行数，我尝试按 id 连接单词：第 1 行 - id 1 - word 1 + word 2 + word 3。但是我使用的代码不起作用，因为它说我无权访问基础，删除连接文本的功能，我可以访问基础。

对于问题 1，我正在使用以下代码：

select *
from (select *, row_number() over (partition by ID order by ID) as row_num from tab) user_table
where row_num between 1 and 100000

对于问题 2，我正在使用这个：

select *, concat_ws ('', collect_list (WORD)) as words
from tab
where ORG = 'card'
group by ID

我想将两者结合起来，但两者都不能正常工作：

select *, concat_ws ('', collect_list (WORD)) as words
from (select *, row_number() over (partition by ID order by ID) as row_num from tab) user_table
where row_num between 1 and 100000 and ORG = 'card'
group by ID

sql - 如何优化代码以选择我想在 Hadoop Hue 中提取的行并从列中连接文本？

0 回答 0

Related

Reference