1

请帮助我了解以下是否可能:

var regexMatch = Regex.Match(inputString, "(\S*\d+\S*|\d)+");

if (regexMatch.Value == String.Empty)
{
    return null;
}
else
{
    var trimmedString = regexMatch.Value.Trim();

    if(trimmmedString != regexMatch.Value)
    {
        //Is there any value for inputString that makes this reachable?
    }
}
4

3 回答 3

7

从 .NET 4.0 开始,Trim使用Char.IsWhiteSpace方法来决定要修剪的内容;文档列出了所有将被修剪的字符。由于文档\S没有说它使用相同的字符列表,因此询问是否存在不匹配是一个公平的问题。

找出答案的一种方法是详尽搜索:

var ws = new Regex("\\S");
for (char c = '\0'; c != 0xffff; c++) {
    if (char.IsWhiteSpace(c)) {
        var m = ws.Match("" + c);
        if (m.Value.Length != 0) {
            Console.Error.WriteLine("Found a mismatch: {0}", (int)c);
        }
    }
}

运行此代码不会产生任何结果:char.IsWhitespace考虑空格的 26 个字符中没有一个\S与正则表达式匹配。因此,我必须得出结论,受trimmmedString != regexMatch.Value条件保护的代码是不可达的。

作为旁注,regexMatch.Value永远不可能null根据文档

如果对Regex.MatchorMatch.NextMatch方法的调用未能找到匹配项,则返回的Match.Value属性值为String.Empty

您可以删除第一个if,或将其替换为 compare to String.Empty

于 2013-09-05T20:14:13.497 回答
4

Your code is a bit questionable, but I will answer the question of whether String.Trim() is equivalent to using \s to remove leading and trailing white-spaces or not.

They are equivalent from .NET Framework 4.0

  • From .NET 4.0, String.Trim() will remove leading and trailing characters which make Char.IsWhitespace() returns true.

    Char.IsWhitespace() returns true for characters in categories Zl, Zp, Zs, as per description in the documentation, and also for \t, \n, \v, \f, \r, \x85.

    Note that there seems to be some discrepancies. According to fileformat.info, U+00A0 NO-BREAK SPACE belongs to Zs category, but MSDN doesn't put it in the list of Space Separator in Char.IsWhitespace()'s documentation. Testing reveals that \s matches U+00A0, which means U+00A0 is one of the characters in \p{Z} category.

  • According to the page Character Classes in Regular Expression, \s is equivalent to [\f\n\r\t\v\x85\p{Z}]. The Z category currently consists of 3 sub-categories: Zs, Zl, Zp.

They are not equivalent prior to .NET 4.0

According to String.Trim() documentation:

Because of this change, the Trim method in the .NET Framework 3.5 SP1 and earlier versions removes two characters, ZERO WIDTH SPACE (U+200B) and ZERO WIDTH NO-BREAK SPACE (U+FEFF), that the Trim method in the .NET Framework 4 and later versions does not remove.

In addition, the Trim method in the .NET Framework 3.5 SP1 and earlier versions does not trim three Unicode white-space characters: MONGOLIAN VOWEL SEPARATOR (U+180E), NARROW NO-BREAK SPACE (U+202F), and MEDIUM MATHEMATICAL SPACE (U+205F).

To put it simply, String.Trim() considers a different set of characters for removal in .NET versions prior to 4.0.

The specification for \s in regular expression stays the same from .NET 1.1.

于 2013-09-05T20:19:46.220 回答
3

dasblinkenlight 的答案是错误的,行为从 .NET 3.5 更改为 .NET 4.0。,请参阅此处的“来电者须知”。稍微更改他的代码,使其实际使用Trim()测试未找到 .NET 4.0 的匹配项,但找到 .NET 3.5 的两个匹配项

public class Program
{

    private static void Main(string[] args)
    {
        var ws = new Regex("\\S");
        for (char c = '\0'; c != 0xffff; c++)
        {
            if (new String(c, 1).Trim().Length == 0)
            {
                var m = ws.Match("" + c);
                if (m.Value.Length != 0)
                {
                    Console.Error.WriteLine("Found a mismatch: {0}", (int)c);
                }
            }
        }

        Console.WriteLine("done");
        Console.ReadLine();
    }

}

//Output running in .NET 3.5:
//Found a mismatch: 8203
//Found a mismatch: 65279
//done


//Output running in .NET 4.0:
//done
于 2013-09-05T20:28:05.087 回答