c# - 我需要在 StreamReader ReadLine() 上迭代并行 for 循环，但卡在对对象的多线程访问中

Question

sr我尝试在并行 for 循环中创建流读取器对象 ( )。它正在工作，但需要 1.3 分钟才能获取本应在 4 秒内获取的数据。我希望这个问题与这个StreamReader对象有关。当尝试使用下面的代码时，我遇到了一个错误，我尝试了很多方法来解决，但目前非常卡在它上面。甚至使用并发包volatile，ThreadStatic和lock。

static void Main(string[] args)
{
    Task.Run(() =>
    {
        Thread th0 = new Thread(() => ReadAllLinesAsync(
            @"C:\Users\Administrator\Desktop\Fnale mail\LineDataBackHigh.csv"));
        th0.Start();
        th0.Join();
        watch.Stop();
        Debug.Log("time=" + watch.Elapsed);
        Debug.Log("Finished Task + ");
    });

    Debug.Log("Free Executed, Task Independent");
}

public static string[] ReadAllLinesAsync(string path)
{
    ConcurrentBag<string> lines = new ConcurrentBag<string>();

    // Open the FileStream with the same FileMode, FileAccess
    // and FileShare as a call to File.OpenText would've done.
    using (StreamReader sr = File.OpenText(path))
    {
        string line = String.Empty;
        int k = 0;
        sr8 = sr;

        Thread th0 = new Thread(Fetch);

        th0.Start();
        th0.Join();
        Debug.Log("Finished Reading2" + lines.Count);
        int item = 1;

        void Fetch()
        {
            Parallel.For(k, File.ReadLines(path).Count(), z =>
            {
                sr8 = sr;
                Debug.Log("Executing");
                lines.Add(sr8.ReadLine());
                // sr.Dispose();
            });
        }
    }
    return lines.ToArray();
}

错误：

score 0 · Accepted Answer

实际上，有一个很好的解决方案可以将此构建到使用消费者模式的 .net 框架中

首先，你创建一个像这样的生产者

static void Produce(ITargetBlock<string> target, Stream stream)
{
    using var reader = new StreamReader(stream);
    string? line = null;
    while ((line = reader.ReadLine()) is not null)
    {
        target.Post(line);//tells the comsumet there is something to read
    }

    target.Complete();//tells consumer done filling
}

当“写入数据”完成后，您可以生成将用于读取缓冲区的任务，所以让我们创建消费者

static async Task ConsumeAsync(ISourceBlock<string> source)
{
    //reads the data in a non-blocking way
    while (await source.OutputAvailableAsync())
    {
        string line = await source.ReceiveAsync();

        //do your magic here                
    }   
            
}

现在让我们连接这两个方法并执行消费者-生产者模式

using var stream = Assembly.GetExecutingAssembly().GetManifestResourceStream("MyApp.Resources.LargeFile.csv");
if (stream is null)
    throw new ResourceNotFoundException("LargeFile.csv", Assembly.GetExecutingAssembly());


var buffer = new BufferBlock<string>();
var consumerTask = ConsumeAsync(buffer);
Produce(buffer, stream);
await consumerTask;

想象一下，您需要处理一个大于可用内存的文件...这将根据您读取流的速度与您处理读取字符串的速度来填充...您可能会用完内存..不如阅读快全部并进行并行处理，但您最终会/可能会遇到问题。

至于时间，我在 6 秒内使用此模式将 CSV 文件解析为 .Net 类，仅通过改进锁定，使用该模式需要几分钟。锁定开销越多，性能就越好。

score 0 · Accepted Answer

您似乎正在尝试并行化该File.ReadLines方法，以便您可以通过多个线程并行读取文件的行。不幸的是，这并不容易做到。即使您找到了一种方法，例如通过StreamReader在不同位置打开具有多个 s 的文件，并在解码字节后手动拆分行，也不能保证您会获得任何性能提升。相反，读取文件很可能会明显变慢，因为硬盘的磁头必须频繁地从一个扇区跳到另一个扇区。

我的建议是按File.ReadLines原样使用该方法，并在代码中找到可以优化性能的其他地方。

c# - 我需要在 StreamReader ReadLine() 上迭代并行 for 循环，但卡在对对象的多线程访问中

2 回答 2

Related

Reference