我正在尝试将 XSL 选项与 XpathEntityProcessor 一起使用来从 xml 转换中选择数据。配置运行没有错误,但它不返回任何值。在这一点上,只需让它从下面的IUPAC 实体中索引一个单词,这将是一个本垒打。所有其他实体都按预期工作。我想使用 XSL 选项,以便我可以完全访问 XPATH 以进行特定选择。数据很混乱,需要使用 javascript 或正则表达式进行进一步处理。
我的实体看起来像这样。
<document>
<!-- this outer processor generates a list of files satisfying the conditions
specified in the attributes -->
<entity
name="f"
processor="FileListEntityProcessor"
fileName=".*xml"
newerThan="'NOW-30YEARS'"
recursive="true"
rootEntity="false"
dataSource="null"
baseDir="C:\DrugLabels\Prescription\Test"
transformer="RegexTransformer,TemplateTransformer">
<!-- this strips the file extension and sets the id = the file name -->
<field column="file" regex="^(.*)/|.xml" replaceWith="$1" name="id"/>
<!-- this processor extracts content using Xpath from each file found -->
<entity
name="DrugLabel"
processor="XPathEntityProcessor"
forEach="/document"
url="${f.fileAbsolutePath}">
<entity
name="document_title"
processor="XPathEntityProcessor"
transformer="script:lineToTitleCase"
forEach="/document"
url="${f.fileAbsolutePath}">
<field column="title" xpath="/document/title"/>
</entity>
<entity
name="ingredients"
processor="XPathEntityProcessor"
transformer="script:listToTitleCase"
forEach="/document"
url="${f.fileAbsolutePath}">
<field column="generic_medicine" xpath="/document/component/structuredBody/component/section/subject/manufacturedProduct/manufacturedProduct/asEntityWithGeneric/genericMedicine/name"/></entity>
</entity>
<entity
name="IUPAC"
processor="XPathEntityProcessor"
transformer="RegexTransformer, script:debug"
forEach="/add"
url="${f.fileAbsolutePath}"
xsl="C:\solr-4.3.1\example\solr\DrugLabels\conf\section-description-transform.xsl">
<field column="chemical_name" xpath="/add/doc/field/@chemical_name" flatten="true"/>
</entity>
</entity>
</entity>
</document>
转换文件:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<add>
<doc>
<xsl:for-each select="/document">
<field name="chemical_name"><xsl:value-of select="/document/component/structuredBody/component/section/text/paragraph"/></field>
</xsl:for-each>
</doc>
</add>
</xsl:template>
</xsl:stylesheet>
调试脚本:
function debug(row) {
var r = row.get('chemical_name').toString;
r = 'value: ' + r;
row.put('chemical_name', r);
return row;
}
和输出:
"chemical_name": [
"value: function toString() {/*\njava.lang.String toString()\n*/}\n"