我有一个 SQL 转储的 CSV 文件,我正在 BaseX 8.4 中使用它。CSV 标头包含 SQL 表结构的扁平表示。
带有标题和第一行的 CSV:
country_id,country_code,country_name,publisher_id,publisher_name,country id,year_began,year_ended,series_id,series_name,sort_name,publisher_id
2,us,United States,78,Harvard University Press,2,1950,NULL,15,A New Series,New Series,78
BaseX CSV 解析器生成以下 XML 表示:
<csv>
<record>
<country_id>2</country_id>
<country_code>us</country_code>
<country_name>United States</country_name>
<publisher_id>78</publisher_id>
<publisher_name>Harvard University Press</publisher_name>
<country_id>2</country_id>
<year_began>1950</year_began>
<year_ended>NULL</year_ended>
<series_id>15</series_id>
<series_name>A New Series</series_name>
<sort_name>New Series</sort_name>
<publisher_id>78</publisher_id>
</record>
</csv>
关于原始数据,我知道一个表的开头是它的唯一ID,但是那些ID名称也会作为外键在其他表中重复出现。
我想创建窗口/组,通过匹配表的唯一 ID 的第一次出现(同时忽略每个后续出现)来重建原始表结构。到目前为止我所拥有的不起作用,因为它匹配 ID 的每一次出现,而不仅仅是第一个:
<tables>{
for tumbling window $w in /csv/record/*
start $s at $p when name($s) = ("country_id",
"publisher_id",
"series_id",
"issue_id",
"id_activity_fact",
"id_person_dim",
"id_location_dim",
"id_phys_loc_dim",
"id_letter_dim")
return <table id_name="{name($s)}">{$w}</table>
}</tables>
输出:
<tables>
<table id_name="country_id">
<country_id>2</country_id>
<country_code>us</country_code>
<country_name>United States</country_name>
</table>
<table id_name="publisher_id">
<publisher_id>78</publisher_id>
<publisher_name>Harvard University Press</publisher_name>
</table>
<table id_name="country_id">
<country_id>2</country_id>
<year_began>1950</year_began>
<year_ended>NULL</year_ended>
</table>
<table id_name="series_id">
<series_id>15</series_id>
<series_name>A New Series</series_name>
<sort_name>New Series</sort_name>
</table>
<table id_name="publisher_id">
<publisher_id>78</publisher_id>
</table>
</tables>
期望的输出:
<tables>
<table id_name="country_id">
<country_id>2</country_id>
<country_code>us</country_code>
<country_name>United States</country_name>
</table>
<table id_name="publisher_id">
<publisher_id>78</publisher_id>
<publisher_name>Harvard University Press</publisher_name>
<country_id>2</country_id>
<year_began>1950</year_began>
<year_ended>NULL</year_ended>
</table>
<table id_name="series_id">
<series_id>15</series_id>
<series_name>A New Series</series_name>
<sort_name>New Series</sort_name>
<publisher_id>78</publisher_id>
</table>
</tables>