0

I am making a method to extract information from zipped files. All the zip files will contain just one text file. It is the intend that method should return a string array.

I am using dotnetzip, but i am experiencing a horrable performance. I have tried to benchmark the performance of each step and seems to be performing slowly on all steps.

The c# code is:

        public string[] LoadZipFile(string FileName)
    {
        string[] lines = { };
        int start = System.Environment.TickCount;
        this.richTextBoxLOG.AppendText("Reading " + FileName + "... ");
        try
        {
            int nstart;

            nstart = System.Environment.TickCount;       
            ZipFile zip = ZipFile.Read(FileName);
            this.richTextBoxLOG.AppendText(String.Format("ZipFile ({0}ms)\n", System.Environment.TickCount - nstart));

            nstart = System.Environment.TickCount;
            MemoryStream ms = new MemoryStream();
            this.richTextBoxLOG.AppendText(String.Format("Memorystream ({0}ms)\n", System.Environment.TickCount - nstart));

            nstart = System.Environment.TickCount;
            zip[0].Extract(ms);
            this.richTextBoxLOG.AppendText(String.Format("Extract ({0}ms)\n", System.Environment.TickCount - nstart));

            nstart = System.Environment.TickCount;
            string filecontents = string.Empty;
            using (var reader = new StreamReader(ms)) 
            { 
                reader.BaseStream.Seek(0, SeekOrigin.Begin); 
                filecontents = reader.ReadToEnd().ToString(); 
            }
            this.richTextBoxLOG.AppendText(String.Format("Read ({0}ms)\n", System.Environment.TickCount - nstart));

            nstart = System.Environment.TickCount;
            lines = filecontents.Replace("\r\n", "\n").Split("\n".ToCharArray());
            this.richTextBoxLOG.AppendText(String.Format("SplitLines ({0}ms)\n", System.Environment.TickCount - nstart));
        }
        catch (IOException ex)
        {
            this.richTextBoxLOG.AppendText(ex.Message+ "\n"); 

        }
        int slut = System.Environment.TickCount;
        this.richTextBoxLOG.AppendText(String.Format("Done ({0}ms)\n", slut - start)); 
        return (lines);

As an example I get this output:

Reading xxxx.zip... ZipFile (0ms) Memorystream (0ms) Extract (234ms) Read (78ms) SplitLines (187ms) Done (514ms)

A total of 514 ms. When the same operation is performed in python 2.6 using this code:

def ReadZip(File):
z = zipfile.ZipFile(File, "r")
name =z.namelist()[0]
return(z.read(name).split('\r\n'))

It executes in just 89 ms. Any ideas on how to improve performance is very welcome.

4

2 回答 2

1

你的代码不是同类的,所以比较是不公平的。一些重要的点:

  • 您是否尝试过删除您的日志记录代码?这些AppendText电话将负责一些额外的时间。
  • 您在调用 split 之前进行文件范围的替换,这将大大减慢该部分的过程。只是分开\r\n
  • 您将每一行转换为 char 数组,而不仅仅是返回字符串。这也会减慢速度。
  • 您可能想比较不同的 Zip 库,看看是否有更快的提取方法。
  • 重复调用可能StreamReader.ReadLine比读取整个流然后手动拆分更快。

简而言之,您应该分析一些替代方法,并且如果您想要真正的同类比较,则应该在不使用 RichTextBox 进行中间日志记录的情况下对代码进行计时。

于 2012-08-03T11:37:26.193 回答
1

感谢您的建议。我最终以几种方式更改了代码:

  • 使用 collection.generic 返回行
  • 使用 streamreader.readline

删除日志记录和异常处理并没有太大改变性能。我查看了 sharplibs 解压缩库,但它的实现看起来有点复杂,而且从我在其他帖子中可以看到的内容来看,解压缩可能会有所收获。它现在以大约 300 毫秒的速度运行。

        public List<string> LoadZipFile2(string FileName)
    {
        List<string> lines = new List<string>();
        int start = System.Environment.TickCount;
        string debugtext;
        debugtext = "Reading " + FileName + "... ";
        this.richTextBoxLOG.AppendText(debugtext);

        try
        {
            //int nstart = System.Environment.TickCount;
            ZipFile zip = ZipFile.Read(FileName);
           // this.richTextBoxLOG.AppendText(String.Format("ZipFile ({0}ms)\n", System.Environment.TickCount - nstart));

            //nstart = System.Environment.TickCount;
            MemoryStream ms = new MemoryStream();
            //this.richTextBoxLOG.AppendText(String.Format("Memorystream ({0}ms)\n", System.Environment.TickCount - nstart));

            //nstart = System.Environment.TickCount;
            zip[0].Extract(ms);
            zip.Dispose();
            //this.richTextBoxLOG.AppendText(String.Format("Extract ({0}ms)\n", System.Environment.TickCount - nstart));

            //nstart = System.Environment.TickCount;
            using (var reader = new StreamReader(ms))
            {
                reader.BaseStream.Seek(0, SeekOrigin.Begin);
                while (reader.Peek() >= 0)
                {
                    lines.Add(reader.ReadLine());
                }
            }
            ;
            //this.richTextBoxLOG.AppendText(String.Format("Read ({0}ms)\n", System.Environment.TickCount - nstart));
        }
        catch (IOException ex)
        {
            this.richTextBoxLOG.AppendText(ex.Message + "\n");
        }
        int slut = System.Environment.TickCount;
        this.richTextBoxLOG.AppendText(String.Format("Done ({0}ms)\n", slut - start));
        return (lines);
于 2012-08-10T06:48:07.087 回答