0

输入:

<td>
<span>
<span>spanaaa</span>
<span class="1">spanbbb</span>
<span class="" style="">spanccc</span>
<span style="display:none">spanddd</span>

<div>divaaa</div>
<div class="1">divbbb</div>
<div class="" style="">divccc</div>
<div style="display:none">divddd</div>
</span>
</td>

我需要一个正则表达式或一个方法来获取没有属性 style="display:none" 的值

输出:

spanaaa
spanbbb
spanccc

divaaa
divbbb
divccc

4

4 回答 4

1

模式 [.NET 风格]

(?<=<\w+ [^<>]*?\w+=")(?!display:none)(?<mt>[^"<>]+)(?=")

Options: ^ and $ match at line breaks

Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=<\w+ [^<>]*?\w+=")»
   Match the character “&lt;” literally «<»
   Match a single character that is a “word character” (letters, digits, and underscores) «\w+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
   Match the character “ ” literally « »
   Match a single character NOT present in the list “&lt;>” «[^<>]*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
   Match a single character that is a “word character” (letters, digits, and underscores) «\w+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
   Match the characters “="” literally «="»
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!display:none)»
   Match the characters “display:none” literally «display:none»
Match the regular expression below and capture its match into backreference with name “mt” «(?<mt>[^"<>]+)»
   Match a single character NOT present in the list “"<>” «[^"<>]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=")»
   Match the character “"” literally «"»

模式 [PCRE]

<!--
(<\w+ [^<>]*?\w+=")(?!display:none)([^"<>]+)(?=")

Options: ^ and $ match at line breaks

Match the regular expression below and capture its match into backreference number 1 «(<\w+ [^<>]*?\w+=")»
   Match the character “&lt;” literally «<»
   Match a single character that is a “word character” (letters, digits, and underscores) «\w+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
   Match the character “ ” literally « »
   Match a single character NOT present in the list “&lt;>” «[^<>]*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
   Match a single character that is a “word character” (letters, digits, and underscores) «\w+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
   Match the characters “="” literally «="»
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!display:none)»
   Match the characters “display:none” literally «display:none»
Match the regular expression below and capture its match into backreference number 2 «([^"<>]+)»
   Match a single character NOT present in the list “"<>” «[^"<>]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=")»
   Match the character “"” literally «"»
于 2012-04-18T05:44:30.633 回答
0

正则表达式是一个糟糕的选择(因为 HTML 的变幻莫测),但是你可以试试这个:

<div(?!\s*style="display:none")[^>]*>(.*?)</div>
于 2012-04-18T05:41:53.517 回答
0

它是 CSharp 版本,比正则表达式解析快 8 倍。您可以轻松转换为您想要的任何语言。

public static string StripTagsCharArray(string source)
{
char[] array = new char[source.Length];
int arrayIndex = 0;
bool inside = false;

for (int i = 0; i < source.Length; i++)
{
    char let = source[i];
    if (let == '<')
    {
    inside = true;
    continue;
    }
    if (let == '>')
    {
    inside = false;
    continue;
    }
    if (!inside)
    {
    array[arrayIndex] = let;
    arrayIndex++;
    }
}
return new string(array, 0, arrayIndex);
}
于 2013-08-13T14:04:01.137 回答
0
input = Regex.Replace(input, @"<div style=""display:none"">(.|\n)*?</div>", string.Empty, RegexOptions.Singleline);  

这里的输入是包含 Html 的字符串。试试这个正则表达式,它会工作!

于 2012-12-06T07:07:17.373 回答