0

I need to parse a huge xml of photo albums. I'm using PHP SimpleXML to parse, however it fails on some entries with errors because extra brackets may appear in some cases, see 'description' or 'CameraModel' tags.

How do I clean up xml before loading it with SimpleXML? If possible, replace extra brackets with '_' underscore.

Here is my xml:

<values>
<photos>
<photo><photoID>4521</photoID>
<name></name>
<description>Seattle<3</description>
<fileName>S5001497.jpg</fileName>
<fileSize>177513</fileSize>
<fileSizeOriginal>2359669</fileSizeOriginal>
<width>1200</width>
<height>900</height>
<exif><CameraModel><Digimax S500 / Kenox S500</CameraModel>
<CameraMake>Samsung Techwin</CameraMake>
<DateTime>2008-07-12 17:37:24</DateTime>
<Version>220</Version>
<SourceWidth>2592</SourceWidth>
<SourceHeight>1944</SourceHeight>
<Orientation>1</Orientation>
<FlashUsed>89</FlashUsed>
<FocalLength>5.8</FocalLength>
<ExposureTime>0.033333</ExposureTime>
<Brightness></Brightness>
<ApertureFNumber>2.8</ApertureFNumber>
<ISO>177</ISO>
<ExposureProgram>0</ExposureProgram>
</exif>
<type>photo</type>
<GPS></GPS>
</photo>
</photos>
</values>
4

1 回答 1

1

Use regex

print preg_replace("/(<([\w]+)[^>]*>.*)(<)(.*<\/\\2>)/", "$1_$4", $xml);
于 2013-09-25T23:01:34.550 回答