您可以使用 jqgram,一种 PQ-Gram 树编辑距离近似的实现来专门解决此问题,但如果您不想移植到 C#,则需要运行 Node.js。不过,端口应该很容易......算法并不是那么复杂。简约中的美。
https://github.com/hoonto/jqgram
在这个例子中是一个 DOM 与 Cheerio 的例子,它展示了如何处理子元素和标签以生成近似的树编辑距离。结果,它为您提供了一个介于 0 和 1 之间的数字,这就是您的百分比相等。但请注意,零值不一定表示相同的树,它仅表示它们非常相似。您也可以轻松地进行 DOM 与 DOM 比较或 Cheerio 与 Cheerio - 或使用 Cheerio 使用的 HTML 解析,而不用担心使用整个库(开箱即用的 Cheerio 是一个相当快的服务器端 jQuery 和类似 DOM执行)。
所以显然这个解决方案是特定于 Node.js 和浏览器 javascript 的,但我认为这些挑战可能比移植到 C#/.NET 更容易。
// This could probably be optimized significantly, but is a real-world
// example of how to use tree edit distance in the browser.
// For cheerio, you'll have to browserify,
// which requires some fiddling around
// due to cheerio's dynamically generated
// require's (good grief) that browserify
// does not see due to the static nature
// of its code analysis (dynamic off-line
// analysis is hard, but doable).
//
// Ultimately, the goal is to end up with
// something like this in the browser:
var cheerio = require('./lib/cheerio');
// The easy part, jqgram:
var jq = require("../jqgram").jqgram;
// Make a cheerio DOM:
var html = '<body><div id="a"><div class="c d"><span>Irrelevent text</span></div></div></body>';
var cheeriodom = cheerio.load(html, {
ignoreWhitespace: false,
lowerCaseTags: true
});
// For ease, lets assume you have jQuery laoded:
var realdom = $('body');
// The lfn and cfn functions allow you to specify
// how labels and children should be defined:
jq.distance({
root: cheeriodom,
lfn: function(node){
// We don't have to lowercase this because we already
// asked cheerio to do that for us above (lowerCaseTags).
return node.name;
},
cfn: function(node){
// Cheerio maintains attributes in the attribs array:
// We're going to put id's and classes in as children
// of nodes in our cheerio tree
var retarr = [];
if(!! node.attribs && !! node.attribs.class){
retarr = retarr.concat(node.attribs.class.split(' '));
}
if(!! node.attribs && !! node.attribs.id){
retarr.push(node.attribs.id);
}
retarr = retarr.concat(node.children);
return retarr;
}
},{
root: realdom,
lfn: function(node){
return node.nodeName.toLowerCase();
},
cfn: function(node){
var retarr = [];
if(!! node.attributes && !! node.attributes.class && !! node.attributes.class.nodeValue){
retarr = retarr.concat(node.attributes.class.nodeValue.split(' '));
}
if(!! node.attributes && !! node.attributes.id && !! node.attributes.id.nodeValue) {
retarr.push(node.attributes.id.nodeValue);
}
for(var i=0; i<node.children.length; ++i){
retarr.push(node.children[i]);
}
return retarr;
}
},{ p:2, q:3, depth:10 },
function(result) {
console.log(result.distance);
});