I saw several great command-line XML manipulation tools in this discussion, and I'm exploring new ways to extract data from XML files through scripting instead of compiled programs. I'm currently trying out xmlstarlet, but I'm not restricted to using this tool.
I have an XML data file that has tens of thousands of elements. I'd like to extract a subset of those elements based on a list of search terms, and then pipe or otherwise route those elements into some downstream scripts and transforms. The search terms are simple strings--there's no need for regular expressions. If I was doing this with grep on a regular text file, I would probably do something simple like:
grep -Ff StringsToSearchFor.txt MassiveFile.txt | [chain of additional commands]
I've been looking through the documentation for tools like xmlstarlet on ways that I could achieve this, and the closest thing I can come up with is this ugly attempt that uses a temporary file. (Note, I am using Windows):
REM Create tempOutput.xml, with an open root node
REM %1 is the file containing the list of strings
REM %2 is the target XML file
for /F %%A in (%1) do (
REM Search for a single matching node, and append the output to tempOutput.xml
xml sel -I -t -c "path/to/search[targetElement='%%A']" %2 >> tempOutput.xml
)
REM Close root node to tempOutput.xml
REM After this stage, pass tempOutput.xml as the input to downstream XML transforms and tools
Needless to say, this is really ugly.
I suppose that one possibility is to modify the for loop to pass a giant list of -c
XPath queries to xmlstarlet all in one shot, but that also seems unnecessarily messy, and I think that I would still be stuck with using the tempOutput.xml file.
Is there a more elegant way to do this? Or is a temporary file really my best approach?