javascript - 你知道一个开源的 Javascript 提取/正则表达式引擎吗？

Question

我们需要一个 DOM 解析器，它能够运行一堆模式并存储结果。为此，我们正在寻找开放的图书馆，我们可以开始，

能够通过正则表达式选择元素（例如，在类、id、元属性等其他属性中获取所有包含“价格”的元素），
应该有很多助手，例如：删除评论、iframe 等
并且非常快。
可以从浏览器扩展运行。

score 3 · Accepted Answer

好的，我会说：
你可以使用jQuery。

UPS：

它是一个非常好的 dom 解析器
它非常擅长操纵 dom（删除/添加/编辑元素）
它有一个很棒且直观的 api
它有一个大而伟大的社区 => 对任何与 jquery 相关的问题都有很多答案
它适用于浏览器扩展（我自己在 chrome 中对其进行了测试，它显然也适用于 ff 扩展：如何在 Firefox 扩展中使用 jQuery）
它是轻量级的（大约 31KB 大小 - 压缩和压缩）
它是跨浏览器的
它绝对是开源的

缺点：

它不依赖正则表达式（尽管这是一件非常好的事情——正如dda已经提到的），但正则表达式可用于过滤元素
不知道它是否可以访问/操作评论

这是一些 jquery 操作的示例：

// select all the iframe elements with the class advertisement 
// that have the word "porn" in their src attribute
$('iframe.advertisement[src*=porn]')
    // filter the ones that contains the word "poney" in their title 
    // with the help of a regex
    .filter(function(){
        return /poney/gi.test((this.title || this.document.title).test()));
    }) 
        // and remove them
        .remove()
        // return to the whole match
        .end()
    // filter them again, this time 
    // affect only the big ones
    .filter(function(){
        return $(this).width() > 100 && $(this).height() > 100;
    })
        // replace them with some html markup
        .replaceWith('<img src="harmless_bunnies_and_kitties.jpg" />');

score 0 · Accepted Answer

node-htmlparser can parse HTML, provides a DOM with a number of utils (also supports filtering by functions) and can be run in any context (even in WebWorkers).

I forked it a while back, improved it for better speed and got some insane results (read: even faster than native libexpat bindings).

Nevertheless, I would advice you to use the original version, as it supports browsers out-of-the-box (my fork can be run in browsers using browserify, which adds some overhead).

javascript - 你知道一个开源的 Javascript 提取/正则表达式引擎吗？

2 回答 2

Related

Reference