2

我正在使用http://caja.appspot.com/html-css-sanitizer-minified.js来清理用户 html,但是在某些情况下,我想将使用的标签限制为白名单。

我发现https://code.google.com/p/google-caja/wiki/CajaWhitelists描述了如何定义白名单,但我不知道如何将它传递给 html 提供的 html_sanitize 方法-css-sanitizer-minified.js

我试过调用 html.sanitizeWithPolicy(the_html, white_list); 但我收到一个错误:

TypeError: a is not a function

由于缩小,这很难调试,但似乎 html-css-sanitizer-minified.js 不包含 html-sanitizer.js 文件中的所有内容。

我尝试使用 html-sanitizer.js 与 cssparser.js 结合而不是缩小版本,但在调用它之前出现错误,大概是因为我缺少其他依赖项。

我怎样才能使这项工作?

编辑: sanitizeWithPolicy 确实存在于缩小文件中,但在此过程中进一步丢失了一些东西。这表明此文件不能与自定义白名单一起使用。我现在正在调查是否有可能找出我需要包含哪些未缩小的文件来制作我自己的版本。

Edit2:我缺少两个文件https://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/plugin/html4-defs.js?spec=svn1950&r=1950https://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/plugin/uri.js?r=5170

但是我现在收到一个错误,因为 sanitizeWithPolicy 需要一个函数而不是白名单对象。此外,html4-defs.js 文件非常旧,因此必须构建 caja 项目才能获得更新的项目。

4

2 回答 2

1

I've decided on another approach. I've left the other answer in case I manage to find the bit values for the css definitions as it would be preferable to this one if I could get it to work.

This time I've taken the html-css-sanitizer-minified file and injected a bit of code into it so that the element and attributes can be modified.

Search for :

ka=/^(?:https?|mailto)$/i,m={};

And after it insert the following:

var unmodified_elements = {};
for(var property_name in $.ELEMENTS) {
    unmodified_elements[property_name] = $.ELEMENTS[property_name];
};
var unmodified_attributes = {};
for(var property_name in $.ATTRIBS) {
    unmodified_attributes[property_name] = $.ATTRIBS[property_name];
};

var resetElements = function () {
    $.ELEMENTS = {};
    for(var property_name in unmodified_elements) {
        $.ELEMENTS[property_name] = unmodified_elements[property_name];
    }
    $.f = $.ELEMENTS;
};

var resetAttributes = function () {
    $.ATTRIBS = {};
    for(var property_name in unmodified_attributes) {
        $.ATTRIBS[property_name] = unmodified_attributes[property_name];
    }
    $.m = $.ATTRIBS;
};

var resetWhiteLists = function () {
    resetElements();
    resetAttributes();
};

/**
 * Trims down the element white list to just those passed in whilst still not allowing unsafe elements.
 * @param {array} custom_elements An array of elements to include.
 */
var applyElementsWhiteList = function(custom_elements) {
    resetElements();
    var length = custom_elements.length;
    var new_elements = {};
    for (var i = 0; i < length; i++) {
        var key = custom_elements[i].toLowerCase();
        if (typeof $.ELEMENTS[key] !== 'undefined') {
            new_elements[key] = $.ELEMENTS[key];
        }
    }
    $.f = new_elements;
    $.ELEMENTS = new_elements;
};

  /**
   * Trims down the attribute white list to just those passed in whilst still not allowing unsafe elements.
   * @param {array} custom_attributes An array of attributes to include.
   */
var applyAttributesWhiteList = function(custom_attributes) {
    resetAttributes();
    var length = custom_attributes.length;
    var new_attributes = {};
    for (var i = 0; i < length; i++) {
        var key = custom_attributes[i].toLowerCase();
        if (typeof $.ATTRIBS[key] !== 'undefined') {
            new_attributes[key] = $.ATTRIBS[key];
        }
    }
    $.m = new_attributes;
    $.ATTRIBS = new_attributes;
};

m.applyElementsWhiteList = applyElementsWhiteList;
m.applyAttributesWhiteList = applyAttributesWhiteList;
m.resetWhiteLists = resetWhiteLists;

You can now apply a white list with :

var raw = "<a>element tags removed</a><p class='class-removed' style='color:black'>the p tag is kept</p>";
var tag_white_list = [
    'p'
];
var attribute_white_list = [
    '*::style'
];
html.applyElementsWhiteList(tag_white_list);
html.applyAttributesWhiteList(attribute_white_list);
var san = html.sanitize(raw);

This approach also sanatizes the styles, which I needed. Another white list could be injected for those, but I don't need that so I havn't written one.

于 2014-10-14T17:01:59.890 回答
1

我通过下载未缩小的文件解决了这个问题

https://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/plugin/html-sanitizer.js

https://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/plugin/uri.js

https://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/plugin/html4-defs.js?spec=svn1950&r=1950 (最后一个来自旧版本。此文件是从 Java 文件构建的,如果有更新的可用,那就太好了。)

然后我向 html-sanitizer.js 添加了一个新函数

/**
* Trims down the element white list to just those passed in whilst still not allowing unsafe elements.
* @param {array} custom_elements An array of elements to include.
*/
function useCustomElements(custom_elements) {
  var length = custom_elements.length;
  var new_elements = {};
  for (var i = 0; i < length; i++) {
      var key = custom_elements[i].toLowerCase();
      if (typeof elements.ELEMENTS[key] !== 'undefined') {
          new_elements[key] = elements.ELEMENTS[key];
      }
  }
  elements.ELEMENTS = new_elements;
};

然后,我在文件末尾附近将此函数与其他公共函数语句一起公开。

html.useCustomElements = html['useCustomElements'] = useCustomElements;

现在我可以这样称呼它:

var raw = '<p>This element is kept</p><div>this element is not</div>';
var white_list ['p', 'b'];
html.useCustomElements(white_list)
var sanitized = html.sanitize(raw);

然后我手动将一些 html5 元素添加到 html4-defs.js 文件中(那些只定义块元素的元素,例如 and )。

属性清理仍然被破坏。这是由于 html4-defs.js 文件与 html-sanitizer.js 已过期。我在 html-sanitizer.js 中改变了这个:

if ((attribKey = tagName + '::' + attribName,
     elements.ATTRIBS.hasOwnProperty(attribKey)) ||
    (attribKey = '*::' + attribName,
     elements.ATTRIBS.hasOwnProperty(attribKey))) {
  atype = elements.ATTRIBS[attribKey];
}

if (elements.ATTRIBS.hasOwnProperty(attribName)) {
  atype = elements.ATTRIBS[attribName];
}

这远非理想,但如果没有编译 Caja 并生成最新的 html-defs.js 文件,我看不到解决方法。

这仍然会留下 css 清理。我也想要这个,但是我缺少 css def 文件并且无法通过搜索找到任何有效的文件,所以我现在将其关闭。

编辑:我已经设法从 html-css-sanitizer-minified.js 中提取 html-defs。我已将副本上传到此处。它包括像“nav”这样的元素,因此它已针对 html5 进行了更新。

我尝试对 css 解析做同样的事情,我设法提取了 defs,但它们取决于位数,而且我无论如何都找不到计算哪些位用于哪些默认值。

于 2014-10-14T13:04:22.353 回答