c# - Most efficient way to parse a dynamic List of int from string

Question

Out of curiosity, is there a faster/more efficient way to parse a dynamic list of ints from a string?

Currently I have this, and it works absolutely fine; I was just thinking there might be a better way as this seems a little overly complex for something so simple.

public static void Send(string providerIDList)
{
String[] providerIDArray = providerIDList.Split('|');
var providerIDs = new List<int>();
for (int counter = 0; counter < providerIDArray.Count(); counter++)
{
     providerIDs.Add(int.Parse(providerIDArray[counter].ToString()));
}
//do some stuff with the parsed list of int

Edit: Perhaps I should have said a more simple way to parse out my list from the string. But since the original question did state faster and more efficient the chosen answer will reflect that.

score 14 · Accepted Answer

肯定有更好的方法。使用 LINQ：

var providerIDs = providerIDList.Split('|')
                                .Select(x => int.Parse(x))
                                .ToList();

或者使用方法组转换而不是 lambda 表达式：

var providerIDs = providerIDList.Split('|')
                                .Select(int.Parse)
                                .ToList();

这不是最有效的方法，但它很可能是最简单的。它与您的方法一样有效 - 尽管可以相当容易地稍微提高效率，例如给出List初始容量。

性能上的差异可能无关紧要，所以我会坚持使用这个简单的代码，直到你有证据证明它是一个瓶颈。

请注意，如果您不需要List<int>- 如果您只需要可以迭代一次的东西 - 您可以终止ToList调用并providerIDs用作IEnumerable<int>.

编辑：如果我们从事效率业务，那么这里是该ForEachChar方法的改编，以避免使用int.Parse：

public static List<int> ForEachCharManualParse(string s, char delim)
{
    List<int> result = new List<int>();
    int tmp = 0;
    foreach(char x in s)
    {
        if(x == delim)
        {
            result.Add(tmp);
            tmp = 0;
        } 
        else if (x >= '0' && x <= '9')
        {
            tmp = tmp * 10 + x - '0';
        }
        else
        {
            throw new ArgumentException("Invalid input: " + s);
        }
    }
    result.Add(tmp);
    return result;
}

笔记：

这将为任何连续的分隔符添加零，或在开始或结束处添加分隔符
它不处理负数
它不检查溢出
正如评论中所指出的，使用switch语句而不是x >= '0' && x <= '9'可以进一步提高性能（大约 10-15%）

如果这些都不是你的问题，它比ForEachChar我的机器快 7 倍：

ListSize 1000 : StringLen 10434
ForEachChar1000 Time : 00:00:02.1536651
ForEachCharManualParse1000 Time : 00:00:00.2760543

ListSize 100000 : StringLen 1048421
ForEachChar100000 Time : 00:00:02.2169482
ForEachCharManualParse100000 Time : 00:00:00.3087568

ListSize 10000000 : StringLen 104829611
ForEachChar10000000 Time : 00:00:22.0803706
ForEachCharManualParse10000000 Time : 00:00:03.1206769

这些限制可以解决，但我没有打扰......让我知道它们是否对您来说是重要的问题。

score 2 · Accepted Answer

到目前为止，我不喜欢任何答案。因此，为了真正回答 OP 用 Int.Parse 提出的“最快/最有效”的 String.Split 问题，我编写并测试了一些代码。

在 Intel 3770k 上使用 Mono。

我发现使用 String.Split + IEnum.Select 并不是最快（也许是最漂亮）的解决方案。事实上，它是最慢的。

这是一些基准测试结果

ListSize 1000 : StringLen 10468 
SplitForEach1000 Time : 00:00:02.8704048 
SplitSelect1000 Time : 00:00:02.9134658 
ForEachChar1000 Time : 00:00:01.8254438 
SplitParallelSelectr1000 Time : 00:00:07.5421146 
ForParallelForEachChar1000 Time : 00:00:05.3534218

ListSize 100000 : StringLen 1048233 
SplitForEach100000 Time : 00:00:01.9500846 
SplitSelect100000 Time : 00:00:02.2662606 
ForEachChar100000 Time : 00:00:01.2554577 
SplitParallelSelectr100000 Time : 00:00:02.6509969 
ForParallelForEachChar100000 Time : 00:00:01.5842131

ListSize 10000000 : StringLen 104824707 
SplitForEach10000000 Time : 00:00:18.2658261 
SplitSelect10000000 Time : 00:00:20.6043874 
ForEachChar10000000 Time : 00:00:10.0555613 
SplitParallelSelectr10000000 Time : 00:00:18.1908017 
ForParallelForEachChar10000000 Time : 00:00:08.6756213

这是获取基准测试结果的代码

using System;
using System.Collections.Generic;
using System.Collections.Concurrent;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using System.Diagnostics;

namespace FastStringSplit
{
    class MainClass
    {
        public static void Main (string[] args)
        {
            Random rnd = new Random();
            char delim = ':';
            int[] sizes = new int[]{1000, 100000, 10000000 };
            int[] iters = new int[]{10000, 100, 10};
            Stopwatch sw;

            List<int> list, result = new List<int>();
            string str;
            for(int s=0; s<sizes.Length; s++) {
                list = new List<int>(sizes[s]);
                for(int i=0; i<sizes[s]; i++)
                    list.Add (rnd.Next());
                str = string.Join(":", list);
                Console.WriteLine(string.Format("\nListSize {0} : StringLen {1}", sizes[s], str.Length));
                ////
                sw = new Stopwatch();
                for(int i=0; i<iters[s]; i++) {
                    sw.Start();
                    result = SplitForEach(str, delim);
                    sw.Stop();
                }
                Console.WriteLine("SplitForEach" + result.Count + " Time : " + sw.Elapsed.ToString());
                ////
                sw = new Stopwatch();
                for(int i=0; i<iters[s]; i++) {
                    sw.Start();
                    result = SplitSelect(str, delim);
                    sw.Stop();
                }
                Console.WriteLine("SplitSelect" + result.Count + " Time : " + sw.Elapsed.ToString());
                ////
                sw = new Stopwatch();
                for(int i=0; i<iters[s]; i++) {
                    sw.Start();
                    result = ForEachChar(str, delim);
                    sw.Stop();
                }
                Console.WriteLine("ForEachChar" + result.Count + " Time : " + sw.Elapsed.ToString());
                ////
                sw = new Stopwatch();
                for(int i=0; i<iters[s]; i++) {
                    sw.Start();
                    result = SplitParallelSelect(str, delim);
                    sw.Stop();
                }
                Console.WriteLine("SplitParallelSelectr" + result.Count + " Time : " + sw.Elapsed.ToString());
                ////
                sw = new Stopwatch();
                for(int i=0; i<iters[s]; i++) {
                    sw.Start();
                    result = ForParallelForEachChar(str, delim);
                    sw.Stop();
                }
                Console.WriteLine("ForParallelForEachChar" + result.Count + " Time : " + sw.Elapsed.ToString());
            }
        }
        public static List<int> SplitForEach(string s, char delim) {
            List<int> result = new List<int>();
            foreach(string x in s.Split(delim))
                result.Add(int.Parse (x));
            return result;
        }
        public static List<int> SplitSelect(string s, char delim) {
            return s.Split(delim)
                .Select(int.Parse)
                    .ToList();
        }
        public static List<int> ForEachChar(string s, char delim) {
            List<int> result = new List<int>();
            int start = 0;
            int end = 0;
            foreach(char x in s) {
                if(x == delim || end == s.Length - 1) {
                    if(end == s.Length - 1)
                        end++;
                    result.Add(int.Parse (s.Substring(start, end-start)));
                    start = end + 1;
                }
                end++;
            }
            return result;
        }
        public static List<int> SplitParallelSelect(string s, char delim) {
            return s.Split(delim)
                .AsParallel()
                    .Select(int.Parse)
                        .ToList();
        }
        public static int NumOfThreads = Environment.ProcessorCount > 2 ? Environment.ProcessorCount : 2;
        public static List<int> ForParallelForEachChar(string s, char delim) {
            int chunkSize = (s.Length / NumOfThreads) + 1;
            ConcurrentBag<int> result = new ConcurrentBag<int>();
            int[] chunks = new int[NumOfThreads+1];
            Task[] tasks = new Task[NumOfThreads];
            for(int x=0; x<NumOfThreads; x++) {
                int next = chunks[x] + chunkSize;
                while(next < s.Length) {
                    if(s[next] == delim)
                        break;
                    next++;
                }
                //Console.WriteLine(next);
                chunks[x+1] = Math.Min(next, s.Length);
                tasks[x] = Task.Factory.StartNew((o) => {
                    int chunkId = (int)o;
                    int start = chunks[chunkId];
                    int end = chunks[chunkId + 1];
                    if(start >= s.Length)
                        return;
                    if(s[start] == delim)
                        start++;
                    //Console.WriteLine(string.Format("{0} {1}", start, end));
                    for(int i = start; i<end; i++) {
                        if(s[i] == delim || i == end-1) {
                            if(i == end-1) 
                                i++;
                            result.Add(int.Parse (s.Substring(start, i-start)));
                            start = i + 1;
                        }
                    }
                }, x);
            }
            Task.WaitAll(tasks);
            return result.ToList();
        }
    }
}

这是我推荐的功能

        public static List<int> ForEachChar(string s, char delim) {
            List<int> result = new List<int>();
            int start = 0;
            int end = 0;
            foreach(char x in s) {
                if(x == delim || end == s.Length - 1) {
                    if(end == s.Length - 1)
                        end++;
                    result.Add(int.Parse (s.Substring(start, end-start)));
                    start = end + 1;
                }
                end++;
            }
            return result;
        }

为什么它更快？

它不会首先将字符串拆分为数组。它同时进行拆分和解析，因此不会增加遍历字符串以拆分它然后遍历数组以解析它的额外开销。

我还加入了一个使用任务的并行化版本，但只有在字符串非常大的情况下才会更快。

score 0 · Accepted Answer

这看起来更干净：

var providerIDs = providerIDList.Split('|').Select(x => int.Parse(x)).ToList();

score -1 · Accepted Answer

如果你真的想知道最有效的方法，那么使用不安全的代码，从字符串定义 char 指针，迭代所有字符递增 char 指针，缓冲读取字符直到下一个'|'，将缓冲的字符转换为 int32。如果你想非常快，然后手动进行（从最后一个字符开始，'0'字符的子结构值，将它乘以 10、100、1000 ...根据迭代变量，然后将其添加到总和变量。我不有时间写代码，但希望你能明白

c# - Most efficient way to parse a dynamic List of int from string

4 回答 4

Related

Reference