Find centralized, trusted content and collaborate around the technologies you use most.
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
我已经为 Apache Nutch 和 Solr 配置了用于过滤 html 内容的提取器插件。我如何能够使用 css 引擎或 xpath 引擎访问内部 div 内容。提前致谢。
Just use the "text" function. For instance if your html is look like this:
<div class="target"> Hello <span>World!</span> </div>
Then your extract-to rule is similar to this:
<extract-to field="my-field"> <text> <expr value=".target"/> </text> </extract-to>