Your code is a bit questionable, but I will answer the question of whether String.Trim()
is equivalent to using \s
to remove leading and trailing white-spaces or not.
They are equivalent from .NET Framework 4.0
From .NET 4.0, String.Trim()
will remove leading and trailing characters which make Char.IsWhitespace()
returns true.
Char.IsWhitespace()
returns true for characters in categories Zl, Zp, Zs, as per description in the documentation, and also for \t
, \n
, \v
, \f
, \r
, \x85
.
Note that there seems to be some discrepancies. According to fileformat.info, U+00A0 NO-BREAK SPACE
belongs to Zs
category, but MSDN doesn't put it in the list of Space Separator in Char.IsWhitespace()
's documentation. Testing reveals that \s
matches U+00A0
, which means U+00A0
is one of the characters in \p{Z}
category.
According to the page Character Classes in Regular Expression, \s
is equivalent to [\f\n\r\t\v\x85\p{Z}]
. The Z
category currently consists of 3 sub-categories: Zs, Zl, Zp.
They are not equivalent prior to .NET 4.0
According to String.Trim()
documentation:
Because of this change, the Trim method in the .NET Framework 3.5 SP1 and earlier versions removes two characters, ZERO WIDTH SPACE (U+200B)
and ZERO WIDTH NO-BREAK SPACE (U+FEFF)
, that the Trim method in the .NET Framework 4 and later versions does not remove.
In addition, the Trim method in the .NET Framework 3.5 SP1 and earlier versions does not trim three Unicode white-space characters: MONGOLIAN VOWEL SEPARATOR (U+180E)
, NARROW NO-BREAK SPACE (U+202F)
, and MEDIUM MATHEMATICAL SPACE (U+205F)
.
To put it simply, String.Trim()
considers a different set of characters for removal in .NET versions prior to 4.0.
The specification for \s
in regular expression stays the same from .NET 1.1.