Take the following simplified XML:
(An excerpt from a very expensive (and large) XML product feed from an 3rd party industry wholesaler data provider - i.e. I have no control over its schema / format / content)
<Company>
<Code>7786</Code>
<Brand>
<!-- /../ -->
<Groups>
<Group>
<!-- /../ -->
<Group>
<!-- /../ -->
<Group>
<!-- /../ -->
<Product>
<Pip_code>3623949</Pip_code>
<!-- /../ -->
<Other_Codes>
<Other_Code>
<Code_Description>EAN</Code_Description>
<Code_Value>5013158781351</Code_Value>
</Other_Code>
<Other_Code>
<Code_Description>shipper EAN</Code_Description>
<Code_Value>503158781443</Code_Value>
</Other_Code>
<Other_Code>
<Code_Description>AMPP</Code_Description>
<Code_Value>19192411000001107</Code_Value>
</Other_Code>
<Other_Code>
<Code_Description>AMPP Manf</Code_Description>
<Code_Value>2061801000001104</Code_Value>
</Other_Code>
</Other_Codes>
</Product>
</Group>
</Group>
</Group>
</Groups>
</Brand>
<Brand>
<!-- /../ -->
<Groups>
<Group>
<!-- /../ -->
<Product>
<Pip_code>3265725</Pip_code>
<!-- /../ -->
<Other_Codes>
<Other_Code>
<Code_Description>Outer EAN</Code_Description>
<Code_Value>5013158776531</Code_Value>
</Other_Code>
<Other_Code>
<Code_Description>AMPP</Code_Description>
<Code_Value>11521811000001106</Code_Value>
</Other_Code>
<Other_Code>
<Code_Description>AMPP Manf</Code_Description>
<Code_Value>2061801000001104</Code_Value>
</Other_Code>
<Other_Code>
<Code_Description>EAN</Code_Description>
<Code_Value>5013158776500</Code_Value>
</Other_Code>
</Other_Codes>
</Product>
</Group>
</Groups>
</Brand>
</Company>
I have been tasked with producing the following output:
This is proving to be extremely difficult for two reasons:
You will notice that
<product>
3623949 exists withinGroups/Group/Group/Group/
(3 levels of Group) whereas<product>
3265725 exists withinGroups/Group/
(1 level of Group) - I need to select all<product>
nodes regardless of their nesting level (ideally whenever parent node =<Group>
)Note the
<Other_Codes>
node within<product>
- I need to select the repeating child nodes (Code_Description & Code_Value) as two seperate columns, but note also that there could be any number of<Other_Code>
nodes within<Other_Codes>
and I need to select them all (rendering [i] notation useless)
I've tried my best to get this data using TSQL OPENXML
, but the only way I can think of doing it requires me to duplicate code for the various Group/Group/ levels and I also have to hard code a set number of occurrences of Other_Code using [0] notation - which given their variable nature; is wrong.
SELECT *
FROM OPENXML (@idoc, 'Company/Brand/Groups/Group/Product',2)
WITH (
PIP_code CHAR (20) 'Pip_code',
OtherCodeType CHAR (20) 'Other_Codes/Other_Code[1]/Code_Description',
OtherCodeValue CHAR (30) 'Other_Codes/Other_Code[1]/Code_Value',
OtherCodeType2 CHAR (20) 'Other_Codes/Other_Code[2]/Code_Description',
OtherCodeValue2 CHAR (30) 'Other_Codes/Other_Code[2]/Code_Value',
OtherCodeType3 CHAR (20) 'Other_Codes/Other_Code[3]/Code_Description',
OtherCodeValue3 CHAR (30) 'Other_Codes/Other_Code[3]/Code_Value'
)
We would prefer to do this in TSQL (as this XML file is already being processed there for other nodes & paths not relevant here), we have SQL Server 2008, 2008 R2 & SQL Server 2014 at our disposal, but non-SQL solutions would also be helpful - we really have hit a brick wall on this one.