2
4

2 回答 2

1

如果您使用 javascript 进行用户输入,那么无论您做什么,它都不会是防弹的。

假设您正在编写服务器端后端,您应该使用经过验证的真实 bbcode,它必须有一个库。

于 2012-08-22T23:12:43.900 回答
1

如果您有一个简单的标签白名单,并且您不需要担心编码级别或以下的攻击(就像浏览器端 JavaScript 中的情况一样),您可以执行以下操作:

function sanitize(tagWhitelist, html) {
  // Get rid of all uses of '['.
  html = String(html).replace(/\[/g, '[');

  // Consider all uses of '<' and replace whitelisted tags with markers like
  // [1] which are indices into a list of approved tag names.
  // Replace all other uses of < and > with entities.
  var tags = [];
  html = html.replace(
    /<!--[\s\S]*?-->|<(\/?)([a-z]\w*)(?:[^"'>]|"[^"]*"|'[^']*')*>/g,
    function (_, close, tagName) {
      if (tagName) {
        tagName = tagName.toLowerCase();
        if (tagWhitelist.hasOwnProperty(tagName) && tagWhitelist[tagName]) {
          var index = tags.length;
          tags.push('<' + (close || '') + tagName + '>');
          return '[' + index + ']';
        }
      }
      return '';
    });

  // Escape HTML special characters.  Leave entities alone.
  html = html.replace(/[<>"'@\`\u0000]/g,
    function (c) {
      switch (c) {
        case '<': return '&lt;';
        case '>': return '&gt;';
        case '"': return '&quot;';
        case '\'': return '&#39;';
        case '@': return '&#64;';
      }
      return '&#' + c.charCodeAt(0) + ';';
    });
  if (html.indexOf('<') >= 0) { throw new Error(); }  // Sanity check.

  // Throw out any close tags that don't correspond to start tags.
  // If <table> is used for formatting, embedded HTML shouldn't be able
  // to use a mismatched </table> to break page layout.
  var open = [];
  for (var i = 0, n = tags.length; i < n; ++i) {
    var tag = tags[i];
    if (tag.charAt(1) === '/') {
      var idx = open.lastIndexOf(tag);
      if (idx < 0) { tags[i] = ""; }  // Drop close tag.
      else {
        tags[i] = open.slice(idx).reverse().join('');
        open.length = idx;
      }
    } else if (!HTML5_VOID_ELEMENTS.test(tag)) {
      open.push('</' + tag.substring(1));
    }
  }
  // Now html contains no tags or less-than characters that could become
  // part of a tag via a replacement operation and tags only contains
  // approved tags.
  // Reinsert the white-listed tags.
  html = html.replace(
       /\[(\d+)\]/g, function (_, index) { return tags[index]; });

  // Close any still open tags.
  // This prevents unclosed formatting elements like <ol> and <table> from
  // breaking the layout of containing HTML.
  return html + open.reverse().join('');
}

var HTML5_VOID_ELEMENTS = new RegExp(
     '^<(?:area|base|br|col|command|embed|hr|img|input'
     + '|keygen|link|meta|param|source|track|wbr)\\b');

可以像这样使用

sanitize({ p: true, b: true, i: true, br: true },
         "Hello, <b>World</b>!<script>alert(1337)<\/script>");

如果您需要更多可配置性,例如允许标签上的属性,请参阅Caja HTML sanitizer

正如其他人指出的那样,您的服务器不应该信任来自客户端的结果,因此您应该在将结果嵌入服务器生成的标记之前重新对服务器进行清理。

于 2012-08-22T23:30:52.837 回答