I am making a method to extract information from zipped files. All the zip files will contain just one text file. It is the intend that method should return a string array.
I am using dotnetzip, but i am experiencing a horrable performance. I have tried to benchmark the performance of each step and seems to be performing slowly on all steps.
The c# code is:
public string[] LoadZipFile(string FileName)
{
string[] lines = { };
int start = System.Environment.TickCount;
this.richTextBoxLOG.AppendText("Reading " + FileName + "... ");
try
{
int nstart;
nstart = System.Environment.TickCount;
ZipFile zip = ZipFile.Read(FileName);
this.richTextBoxLOG.AppendText(String.Format("ZipFile ({0}ms)\n", System.Environment.TickCount - nstart));
nstart = System.Environment.TickCount;
MemoryStream ms = new MemoryStream();
this.richTextBoxLOG.AppendText(String.Format("Memorystream ({0}ms)\n", System.Environment.TickCount - nstart));
nstart = System.Environment.TickCount;
zip[0].Extract(ms);
this.richTextBoxLOG.AppendText(String.Format("Extract ({0}ms)\n", System.Environment.TickCount - nstart));
nstart = System.Environment.TickCount;
string filecontents = string.Empty;
using (var reader = new StreamReader(ms))
{
reader.BaseStream.Seek(0, SeekOrigin.Begin);
filecontents = reader.ReadToEnd().ToString();
}
this.richTextBoxLOG.AppendText(String.Format("Read ({0}ms)\n", System.Environment.TickCount - nstart));
nstart = System.Environment.TickCount;
lines = filecontents.Replace("\r\n", "\n").Split("\n".ToCharArray());
this.richTextBoxLOG.AppendText(String.Format("SplitLines ({0}ms)\n", System.Environment.TickCount - nstart));
}
catch (IOException ex)
{
this.richTextBoxLOG.AppendText(ex.Message+ "\n");
}
int slut = System.Environment.TickCount;
this.richTextBoxLOG.AppendText(String.Format("Done ({0}ms)\n", slut - start));
return (lines);
As an example I get this output:
Reading xxxx.zip... ZipFile (0ms) Memorystream (0ms) Extract (234ms) Read (78ms) SplitLines (187ms) Done (514ms)
A total of 514 ms. When the same operation is performed in python 2.6 using this code:
def ReadZip(File):
z = zipfile.ZipFile(File, "r")
name =z.namelist()[0]
return(z.read(name).split('\r\n'))
It executes in just 89 ms. Any ideas on how to improve performance is very welcome.