0

我想在 HTML 图像标签中搜索并删除专有标签。

我想从每个 IMG 标记中删除以下属性:data-base-urldata-linked-resource-default-aliasdata-linked-resource-container-iddata-image-data-linked-resource-iddata-linked-resource-type

所以我正在尝试为 Notepad++ 搜索创建正则表达式,以搜索此代码并删除。

图片代码示例:

<img data-base-url="http://doc.webdomain.com" data-image-="" data-linked-resource-container-id="5374312" data-linked-resource-default-alias="fo005-categories.png" data-linked-resource-id="11468806" data-linked-resource-type="attachment" src="http://doc.musicbox.com/download/attachments/5374312/fo005-categories.png?version=1&amp;modificationDate=1344416572000" title="Musicbox 1.9 &gt; Browsing the front-office &gt; fo005-categories.png" />


<img data-base-url="http://doc.webdomain.com" data-image-="" data-linked-resource-container-id="5374312" data-linked-resource-default-alias="fo008-suppliers.png" data-linked-resource-id="11468815" data-linked-resource-type="attachment" src="http://doc.musicbox.com/download/attachments/5374312/fo008-suppliers.png?version=1&amp;modificationDate=1344416588000" title="Musicbox 1.9 &gt; Browsing the front-office &gt; fo008-suppliers.png" />

我想得到这个图像代码(带有添加的alt属性和截断的src属性值):

<img src="http://doc.musicbox.com/download/attachments/5374312/fo008-suppliers.png" title="" alt="" />

这个表达式怎么写?

4

2 回答 2

2

寻找 :

<img.+src="(.+)" title="(.+)" />

用。。。来代替 :

<img src="\1" title="\2" alt="" />
于 2013-07-10T12:15:13.873 回答
2

描述

这个正则表达式将:

  • 从所有图像标签中提取 src、alt、width 和 title 属性
  • 跳过可能有问题的属性
  • 允许属性以任意顺序出现
  • 对于 src 属性,只使用 upto 但不包括第一个?

正则表达式:

<img\b(?=\s) # capture the open tag
(?=(?:(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\s(src=["][^"]*?)[?"])?)  # find the src attribute and truncate at at the first `?`
(?=(?:(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\s(alt=["][^"]*["]))?)  # find the alt attribute
(?=(?:(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\s(title=["][^"]*["]))?)  # find the title attribute
(?=(?:(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\s(width=["][^"]*["]))?)  # find the width attribute
(?:[^>=]|='[^']*'|="[^"]*"|=[^'"\s]*)*"\s?\/?> # get the entire  tag

用。。。来代替:<img $1" $2 $3 $4 />

The " after $1 is required due to how the src at needs to truncate at the first ? symbol.

In Notepad++

Sample Text

Note in the second image tag I added a potentially problematic attribute.

<img data-base-url="http://doc.webdomain.com" data-image-="" data-linked-resource-container-id="5374312" data-linked-resource-default-alias="fo005-categories.png" data-linked-resource-id="11468806" data-linked-resource-type="attachment" src="http://doc.prestashop.com/download/attachments/5374312/fo005-categories.png?version=1&amp;modificationDate=1344416572000" title="Musicbox 1.9 &gt; Browsing the front-office &gt; fo005-categories.png" />


<img onmouseover=' src="BAD.IMAGE.PNG" ; funImageSwap(src) ; ' data-base-url="http://doc.webdomain.com" data-image-="" data-linked-resource-container-id="5374312" data-linked-resource-default-alias="fo008-suppliers.png" data-linked-resource-id="11468815" data-linked-resource-type="attachment" src="http://doc.prestashop.com/download/attachments/5374312/fo008-suppliers.png?version=1&amp;modificationDate=1344416588000" title="Musicbox 1.9 &gt; Browsing the front-office &gt; fo008-suppliers.png" />

Find What: <img\b(?=\s)(?=(?:(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\s(src=["][^"]*?)[?"])?)(?=(?:(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\s(alt=["][^"]*["]))?)(?=(?:(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\s(title=["][^"]*["]))?)(?=(?:(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\s(width=["][^"]*["]))?)(?:[^>=]|='[^']*'|="[^"]*"|=[^'"\s]*)*"\s?\/?>

Replace with: <img $1" $2 $3 $4 />

There where problems with notepad++ regular expressions in previous versions. This works in 6.3.3 and 6.4.2. However in the later versions the popup dialog box describing the number of replacements has been changed to line of text just under the replace window (next to the arrow in the image)

enter image description here

于 2013-07-10T12:34:10.027 回答