2

I am actually trying to make the result of a wysihtml5 editor secure.
Basically, users cannot enter script/forms/etc tags.

I cannot remove all tags as some of them are used to display the content as wished.
(eg : <h1> to display a title)

The problem is that users are still able to add DOM event listeners binded to some unwanted code.
(eg : <h1 onclick="alert('Houston, got a problem');"></h1>)

I would like to remove all event listeners inside a div (for all descendants inside that div).
The solution I actually tried to use is to check the code as a string to find and replace unwanted content, which worked for the unwanted tags.

What I actually need is a regex matching all event listeners inside all tags.
Something like "select all [on*] between < and >".
Examples :
<h1 onclick=""></h1> => Should match
<h1 onnewevent=""></h1> => Should match
<h1>onclick=""</h1> => Should NOT match

Thanks in advance for your help ;)

4

1 回答 1

2

不应该用正则表达式解析 html。
如果你真的想这样做,这是一种快速而肮脏的方式
(绝不是完整的)。

它只是寻找打开的“onevent”标签和紧随其后的结束标签。
如果中间还有其他内容,只需添加一个.*?between 标签。

 #  <([^<>\s]+)\s[^<>]*on[^<>="]+=[^<>]*></\1\s*>
 # /<([^<>\s]+)\s[^<>]*on[^<>="]+=[^<>]*><\/\1\s*>/

 < 
 ( [^<>\s]+ )                    # (1), 'Tag'
 \s 
 [^<>]* on [^<>="]+ = [^<>]*     # On... = event
 >
 </ \1 \s* >                     # Backref to 'Tag'

Perl 测试用例

$/ = undef;

$str = <DATA>;

while ( $str =~ /<([^<>\s]+)\s[^<>]*on[^<>="]+=[^<>]*><\/\1\s*>/g )
{
    print "'$&'\n";
}


__DATA__
(eg : <h1 onclick="alert('Houston, got a problem');"></h1>) 

I would like to remove all event listeners inside a div
(for all descendants inside that div).
The solution I actually tried to use is to check the code as
a string to find and replace unwanted content,
which worked for the unwanted tags. 

What I actually need is a regex matching all event
listeners inside all tags.
Something like "select all [on*] between < and >".
Examples :
<h1 onclick=""></h1> => Should match
<h1 onnewevent=""></h1> => Should match
<h1>onclick=""</h1> => Should NOT match 

输出>>

'<h1 onclick="alert('Houston, got a problem');"></h1>'
'<h1 onclick=""></h1>'
'<h1 onnewevent=""></h1>'
于 2014-03-18T17:01:08.330 回答