sql - 读取编号不均匀的平面文件。列数

Question

我有一个这样的 csv 文件

**0, xyz, 20130301121212
1, 6997, 01234
2, 012345, 5678999, Y, 11, 20130301
2, 012345, 5678988, Y, 11, 20130301
1, 6647, 01234
2, 012345, 5678999, Y, 11, 20130301
2, 012345, 5678988, Y, 11, 20130301
9, 8**

0为表头，1为不同店铺，2为产品详情记录

具有 2 的行（作为第一列）是具有第 1 行（作为第一列）的商店的详细信息

有人可以告诉我如何将这些行与 2 和相应的 1 分组吗

score 0 · Accepted Answer

我没有使用 SQL Server，我只能提供一般指导：

1）将文件加载到数据库中，列中的整行，添加行号。结果将类似于（rid是行号）：

rid rline
1   1, 6997, 01234
2   2, 012345, 5678999, Y, 11, 20130301
3   2, 012345, 5678988, Y, 11, 20130301
4   1, 6647, 01234
5   2, 012345, 5678999, Y, 11, 20130301
6   2, 012345, 5678988, Y, 11, 20130301

2）使用一些SQL来获取所需形状的数据。这意味着您必须为每行 2 找到第一行 1。未测试：

select 
    csvdata.rline,
    csvdata.rid,
    (select rline from csvdata x where rline like '1,%' and x.rid < csvdata.rid order by x.rid desc limit 1) as TopRline
from 
    csvdata
where
    rline like '2,%' -- this will limit lines to only those with the detail

希望这将产生以下包含三列的结果：

rid rline                                 TopRline      
2   2, 012345, 5678999, Y, 11, 20130301   1, 6997, 01234
3   2, 012345, 5678988, Y, 11, 20130301   1, 6997, 01234
5   2, 012345, 5678999, Y, 11, 20130301   1, 6647, 01234
6   2, 012345, 5678988, Y, 11, 20130301   1, 6647, 01234

3) 使用某些 SQL 函数将数据拆分为列（例如，在 PostgreSQL 中，text_to_array()会这样做）。假设 2 的结果存储在表 temp 中，则类似于：

select
  (string_to_array(rline,','))[1] as column1,
  (string_to_array(rline,','))[2] as column2,
  (string_to_array(rline,','))[3] as column3,
  (string_to_array(rline,','))[4] as column4,
  (string_to_array(rline,','))[5] as column5,
  (string_to_array(rline,','))[6] as column6,
  (string_to_array(TopRline,','))[1] as column1top,
  (string_to_array(TopRline,','))[2] as column2top,
  (string_to_array(TopRline,','))[3] as column3top,
from
  temp

4）将数据存储在您想要的任何表中。

score 0 · Accepted Answer

注意到 SSIS 对锯齿状 CSV 的困难，我会在 SSIS 运行的 CMD 中做一些前期工作，以便您可以使用标准的 CSV 功能。加载到两个单独的表，然后在共享键列上加入表。

首先将锯齿状文件中的行拆分为非锯齿状文件。在 Windows 中，类似下面的东西应该可以解决问题。

findstr /b "1," InputFile.txt > InputFileRow_1.txt
findstr /b "2," InputFile.txt > InputFileRow_2.txt

然后使用 SSIS 标准 CSV 功能将 InputFileRow_1.txt 和 InputFileRow_2.txt 加载到表 InputFileRow_1 和 InputFileRow_2 表中。

最后，像下面这样分组。

 SELECT *
 FROM InputFileRow_1 ifr1
 INNER JOIN InputFileRow_2 ifr2
 ON ifr1.RowType2_ID = ifr2.RowType2_ID

*根据文件中行类型的分布和文件的大小，从 IO 角度来看，这种方法可能是浪费的。

sql - 读取编号不均匀的平面文件。列数

2 回答 2

Related

Reference