I, um, seem to have gotten lost.
I believe my problem is in parsing a PHP DOMDocument
class correctly.
I have an XML spreadsheet coming from Excel which has headers for different columns. (It also has multiple worksheets, to help the end user in organizing the data.)
My end goal is markers on a map utilizing JavaScript.
A simplified example of the XML file is here: Note: some of the data is strings, some is numeric, and some is HTML.
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook>
<Worksheet ss:Name="data">
<Table>
<Row>
<Cell><Data ss:Type="String">lat</Data></Cell>
<Cell><Data ss:Type="String">lng</Data></Cell>
<Cell><Data ss:Type="String">boolean_1</Data></Cell>
<Cell><Data ss:Type="String">boolean_2</Data></Cell>
<Cell><Data ss:Type="String">Source_documents</Data></Cell>
<Cell><Data ss:Type="String">description</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="Number">35.032139998</Data></Cell>
<Cell><Data ss:Type="Number">-117.346952</Data></Cell>
<Cell><Data ss:Type="Number">1</Data></Cell>
<Cell><Data ss:Type="Number">0</Data></Cell>
<Cell><ss:Data ss:Type="String" xmlns="http://www.w3.org/TR/REC-html40"><Font html:Color="#000000">Copy here inside HTML </Font><I><Font html:Color="#000000">with more copy</Font></I></ss:Data></Cell>
<Cell><Data ss:Type="String">Copy here without HTML</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="Number">43.444</Data></Cell>
<Cell><Data ss:Type="Number">-112.005</Data></Cell>
<Cell><Data ss:Type="Number">1</Data></Cell>
<Cell><Data ss:Type="Number">1</Data></Cell>
<Cell><Data ss:Type="String">Diff Marker Src</Data></Cell>
<Cell><Data ss:Type="String">Diff Marker Desc</Data></Cell>
</Row>
</Table>
</Worksheet>
<Worksheet ss:Name="tags">
<Table>
<Row>
<Cell><Data ss:Type="String">tag_label</Data></Cell>
<Cell><Data ss:Type="String">tag_category</Data></Cell>
<Cell><Data ss:Type="String">tag_description</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="String">boolean_1</Data></Cell>
<Cell><Data ss:Type="String">tag_cat_A</Data></Cell>
<Cell><Data ss:Type="String">bool_1 desc</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="String">boolean_2</Data></Cell>
<Cell><Data ss:Type="String">tag_cat_B</Data></Cell>
<Cell><Data ss:Type="String">bool_2 desc</Data></Cell>
</Row>
</Table>
</Worksheet>
</Workbook>
I've been assuming that I need to convert the spreadsheet into either a JSON array, or a better-structured XML doc, that I can parse to create markers for a map. (JSON seems preferable to reduce data being transferred)
If that assumption is correct, I'd like to have a structure which looks kinda like this:
array => {
data => {
[0] => {
lat => '35.032139998',
lng => '-117.346952',
booleans => {
boolean_1 => true
},
Source_documents => '<Font html:Color="#000000">Copy here inside HTML </Font><I><Font html:Color="#000000">with more copy</Font></I>',
'description' => 'Copy here without HTML'
},
[1] => {
lat => '43.444',
lng => '-112.005',
booleans => {
boolean_1 => true,
boolean_2 => true
},
Source_documents => 'Diff Marker Src',
'description' => 'Diff Marker Desc'
}
},
tags = {
'boolean_1' => {
tag_category => 'tag_cat_A',
'tag_description' => 'bool_1 desc'
},
'boolean_2' => {
tag_category => 'tag_cat_B',
'tag_description' => 'bool_2 desc'
}
}
}
I'm working in PHP, and attempting to transform the XML into JSON utilizing the DOMDocument
class. SimpleXML worked fine for me until a new Excel doc was loaded which included the occasional HTML.
I have this PHP code so far:
function get_worksheet_table($file, $worksheet_name) {
$dom = new DOMDocument;
$dom->load($file);
// returns a new instance of class DOMNodeList
$worksheets = $dom->getElementsByTagName( 'Worksheet' );
foreach($worksheets as $worksheet) {
// check if right sheet
if( $worksheet->getAttribute('ss:Name') == $worksheet_name) {
// trying to get entire node, or childNodeList, or ... ?
// About here I am getting lost.
$nodes = $worksheet->getElementsByTagName('Table')->item(0);
$table = new DOMDocument;
$table->preserveWhiteSpace = false;
$table->formatOutput = true;
$table->createElement('Table');
/*
ITERATE THROUGH $nodes, ADD EACH CELL NODE'S CONTENTS
TO $table -- UNLESS IT HAS HTML, THEN USE DOMinnerHTML(node)
(DOMinnerHTML function @ http://php.net/manual/en/book.dom.php#89718)
*/
return $table;
}
}
return false;
}
$data = get_worksheet_table($file, 'data');
$tags = get_worksheet_table($file, 'tags');
From there, I'm trying to create associative arrays from $data and $tags, then output a big JSON statement to pass to my application.
But it is really a mess, and I'm, well like I said, I'm lost.
Questions:
- Does this look like I'm at least on the right track?
- How do I get access the nodes properly?—I seem to be getting all subnodes as one big text value.
- How do I iterate through the DOM to access the cells' text content where appropriate, and accessing any children of the
<data>
nodes as a string, rather than a child node?
Any pointers you might have toward better understanding how to parse the DOMDocument class would be appreciated. I keep reading through the documentation, but it's eluding me.
Thank you so much for your time.