90

有没有比这更好的方法将 MatchCollection 转换为字符串数组?

MatchCollection mc = Regex.Matches(strText, @"\b[A-Za-z-']+\b");
string[] strArray = new string[mc.Count];
for (int i = 0; i < mc.Count;i++ )
{
    strArray[i] = mc[i].Groups[0].Value;
}

PS:mc.CopyTo(strArray,0)抛出异常:

源数组中的至少一个元素无法转换为目标数组类型。

4

6 回答 6

185

尝试:

var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b")
    .Cast<Match>()
    .Select(m => m.Value)
    .ToArray();
于 2012-07-10T15:02:42.170 回答
33

Dave Bish 的回答很好并且工作正常。

值得注意的是,替换Cast<Match>()OfType<Match>()会加快速度。

代码将变为:

var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b")
    .OfType<Match>()
    .Select(m => m.Groups[0].Value)
    .ToArray();

结果完全相同(并以完全相同的方式解决了 OP 的问题),但对于大字符串来说它更快。

测试代码:

// put it in a console application
static void Test()
{
    Stopwatch sw = new Stopwatch();
    StringBuilder sb = new StringBuilder();
    string strText = "this will become a very long string after my code has done appending it to the stringbuilder ";

    Enumerable.Range(1, 100000).ToList().ForEach(i => sb.Append(strText));
    strText = sb.ToString();

    sw.Start();
    var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b")
              .OfType<Match>()
              .Select(m => m.Groups[0].Value)
              .ToArray();
    sw.Stop();

    Console.WriteLine("OfType: " + sw.ElapsedMilliseconds.ToString());
    sw.Reset();

    sw.Start();
    var arr2 = Regex.Matches(strText, @"\b[A-Za-z-']+\b")
              .Cast<Match>()
              .Select(m => m.Groups[0].Value)
              .ToArray();
    sw.Stop();
    Console.WriteLine("Cast: " + sw.ElapsedMilliseconds.ToString());
}

输出如下:

OfType: 6540
Cast: 8743

因此对于很长的字符串 Cast() 比较慢。

于 2012-07-10T15:28:18.980 回答
6

I ran the exact same benchmark that Alex has posted and found that sometimes Cast was faster and sometimes OfType was faster, but the difference between both was negligible. However, while ugly, the for loop is consistently faster than both of the other two.

Stopwatch sw = new Stopwatch();
StringBuilder sb = new StringBuilder();
string strText = "this will become a very long string after my code has done appending it to the stringbuilder ";
Enumerable.Range(1, 100000).ToList().ForEach(i => sb.Append(strText));
strText = sb.ToString();

//First two benchmarks

sw.Start();
MatchCollection mc = Regex.Matches(strText, @"\b[A-Za-z-']+\b");
var matches = new string[mc.Count];
for (int i = 0; i < matches.Length; i++)
{
    matches[i] = mc[i].ToString();
}
sw.Stop();

Results:

OfType: 3462
Cast: 3499
For: 2650
于 2014-05-14T13:55:45.497 回答
2

One could also make use of this extension method to deal with the annoyance of MatchCollection not being generic. Not that it's a big deal, but this is almost certainly more performant than OfType or Cast, because it's just enumerating, which both of those also have to do.

(Side note: I wonder if it would be possible for the .NET team to make MatchCollection inherit generic versions of ICollection and IEnumerable in the future? Then we wouldn't need this extra step to immediately have LINQ transforms available).

public static IEnumerable<Match> ToEnumerable(this MatchCollection mc)
{
    if (mc != null) {
        foreach (Match m in mc)
            yield return m;
    }
}
于 2018-02-14T18:23:12.980 回答
0

考虑以下代码...

var emailAddress = "joe@sad.com; joe@happy.com; joe@elated.com";
List<string> emails = new List<string>();
emails = Regex.Matches(emailAddress, @"([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})")
                .Cast<Match>()
                .Select(m => m.Groups[0].Value)
                .ToList();
于 2013-11-22T02:01:04.227 回答
0

If you need a recursive capture, eg. Tokenizing Math Equations:

//INPUT (I need this tokenized to do math)
    string sTests = "(1234+5678)/ (56.78-   1234   )";
            
    Regex splitter = new Regex(@"([\d,\.]+|\D)+");
    Match match = splitter.Match(sTests.Replace(" ", ""));
    string[] captures = (from capture in match.Groups.Cast<Group>().Last().Captures.Cast<Capture>()
                         select capture.Value).ToArray();

...because you need to go after the last captures in the last group.

于 2021-08-15T15:16:04.347 回答