I'm extracting data from a TMX - an xml-based translation memory file. The file looks like this (<tu>
entries are multiple, one for each translated string):
<?xml version="1.0" encoding="utf-8"?>
<tmx version="1.4">
<header creationtool="Multilizer" creationtoolversion="6.2.19" datatype="PlainText" segtype="sentence" adminlang="en" srclang="*all*" o-tmf="MLProject">
</header>
<body>
<tu>
<prop type="context">..\..\BuildProcess\Support_Files\CommonFiles\PSRIP\AlambicEdit.dll.Strings.126.2000</prop>
<tuv xml:lang="en-CA">
<seg>Error initializing library: %s.</seg>
</tuv>
<tuv xml:lang="en">
<prop type="status">tsQAed</prop>
<seg>Error initializing library: %s.</seg>
</tuv>
<tuv xml:lang="fr">
<prop type="status">tsQAed</prop>
<seg>Erreur lors de l'initialisation de la librairie %s.</seg>
</tuv>
<tuv xml:lang="de">
<prop type="status">tsQAed</prop>
<seg>Fehler bei der Initialisierung der Bibliothek: %s.</seg>
</tuv>
<tuv xml:lang="es">
<prop type="status">tsQAed</prop>
<seg>Error inicializando biblioteca: %s.</seg>
</tuv>
<tuv xml:lang="it">
<prop type="status">tsQAed</prop>
<seg>Errore di inizializzazione libreria: %s.</seg>
</tuv>
<tuv xml:lang="ja">
<prop type="status">tsQAed</prop>
<seg>ライブラリ初期化時のエラー: %s</seg>
</tuv>
<tuv xml:lang="zh-CN">
<prop type="status">tsQAed</prop>
<seg>初始化库时出错:%s。</seg>
</tuv>
<tuv xml:lang="pt">
<prop type="status">tsQAed</prop>
<seg>Erro ao inicializar biblioteca: %s.</seg>
</tuv>
</tu>
</body>
</tmx>
I need to extract specific languages in a specific order that's not always respected in the TMX, for instance the DE and ES languages are sometimes inverted.
Unfortunately, I haven't found a way to get a child element by the value of it's properties, so I can't do something like segment = x.getElementsByPropertyValue("xml:lang", "en")
, which would be really awesome.
The only alternative I've found was to loop through all of the languages and check them against a properly sorted language array (which would be horribly slow on 600k+ entries in 10 different files).
Is there something obvious I'm missing? Is there such a method?
Note: I'm in WSH Javascript, so I have access to any ActiveXObject available in WSH...