3

我正在尝试使用 FileStream.Seek 快速跳转到一行并阅读它。

但是,我没有得到正确的结果。我已经尝试查看了一段时间,但无法理解我做错了什么。

环境:
操作系统:Windows 7
框架:.NET 4.0
IDE:Visual C# Express 2010

文件位置中的示例数据:C:\Temp\Temp.txt

0001|100!2500
0002|100!2500
0003|100!2500
0004|100!2500
0005|100!2500
0006|100!2500
0007|100!2500
0008|100!2500
0009|100!2500
0010|100!2500

编码:

class PaddedFileSearch
{
    private int LineLength { get; set; }
    private string FileName { get; set; }

    public PaddedFileSearch()
    {
        FileName = @"C:\Temp\Temp.txt";     // This is a padded file.  All lines are of the same length.

        FindLineLength();
        Debug.Print("File Line length: {0}", LineLength);

        // TODO: This purely for testing.  Move this code out.
        SeekMethod(new int[] { 5, 3, 4 });
        /*  Expected Results:
         *  Line No     Position        Line
         *  -------     --------        -----------------
         *  3           30              0003|100!2500
         *  4           15              0004|100!2500
         *  5           15              0005|100!2500 -- This was updated after the initial request.
         */

        /* THIS DOES NOT GIVE THE EXPECTED RESULTS */
        SeekMethod(new int[] { 5, 3 });
        /*  Expected Results:
         *  Line No     Position        Line
         *  -------     --------        -----------------
         *  3           30              0003|100!2500
         *  5           30              0005|100!2500
         */
    }

    private void FindLineLength()
    {
        string line;

        // Add check for FileExists

        using (StreamReader reader = new StreamReader(FileName))
        {
            if ((line = reader.ReadLine()) != null)
            {
                LineLength = line.Length + 2;
                // The 2 is for NewLine(\r\n)
            }
        }

    }

    public void SeekMethod(int[] lineNos)
    {
        long position = 0;
        string line = null;

        Array.Sort(lineNos);

        Debug.Print("");
        Debug.Print("Line No\t\tPosition\t\tLine");
        Debug.Print("-------\t\t--------\t\t-----------------");

        using (FileStream fs = new FileStream(FileName, FileMode.Open, FileAccess.Read, FileShare.None))
        {
            using (StreamReader reader = new StreamReader(fs))
            {
                foreach (int lineNo in lineNos)
                {
                    position = (lineNo - 1) * LineLength - position;
                    fs.Seek(position, SeekOrigin.Current);

                    if ((line = reader.ReadLine()) != null)
                    {
                        Debug.Print("{0}\t\t\t{1}\t\t\t\t{2}", lineNo, position, line);
                    }
                }
            }
        }
    }
}

我得到的输出:

文件行长度:15

线 无位置线
-------- -------- -----
3 30 0003|100!2500
4 15 0004|100!2500
5 45 0005|100!2500

线 无位置线
-------- -------- -----
3 30 0003|100!2500
5 30 0004|100!2500

我的问题是以下输出:

线 无位置线
-------- -------- -----
5 30 0004|100!2500

Line 的输出应为:0005|100!2500

我不明白为什么会这样。

难道我做错了什么?有解决方法吗?还有没有更快的方法可以使用诸如 seek 之类的方法来做到这一点?
(我正在寻找基于代码的选项,而不是Oracle 或 SQL Server。为了论证,我们还可以说文件大小为 1 GB。)

任何帮助是极大的赞赏。

谢谢。

更新:
我在这里找到了 4 个很好的答案。非常感谢。

示例时序:
基于几次运行,以下是从最佳到最佳的方法。即使是好的也非常接近最好。
在包含 10K 行的文件中,2.28 MB。我使用所有选项搜索了相同的 5000 条随机行。

  1. Seek4:经过的时间:00:00:00.0398530 ms -- Ritch Melton
  2. Seek3:经过的时间:00:00:00.0446072 ms -- Valentin Kuzub
  3. Seek1:经过的时间:00:00:00.0538210 毫秒——杰克
  4. Seek2:经过的时间:00:00:00.0889589 ms -- 按位

下面显示的是代码。保存代码后,您只需键入即可调用它TestPaddedFileSeek.CallPaddedFileSeek();。您还必须指定命名空间和“使用引用”。

`

/// <summary>
/// This class multiple options of reading a by line number in a padded file (all lines are the same length).
/// The idea is to quick jump to the file.
/// Details about the discussions is available at: http://stackoverflow.com/questions/5201414/having-a-problem-while-using-filestream-seek-in-c-solved
/// </summary>
class PaddedFileSeek
{
    public FileInfo File {get; private set;}
    public int LineLength { get; private set; }

    #region Private methods
    private static int FindLineLength(FileInfo fileInfo)
    {
        using (StreamReader reader = new StreamReader(fileInfo.FullName))
        {
            string line;
            if ((line = reader.ReadLine()) != null)
            {
                int length = line.Length + 2;   // The 2 is for NewLine(\r\n)
                return length;
            }
        }
        return 0;
    }

    private static void PrintHeader()
    {
       /*
        Debug.Print("");
        Debug.Print("Line No\t\tLine");
        Debug.Print("-------\t\t--------------------------");
       */ 
    }

    private static void PrintLine(int lineNo, string line)
    {
        //Debug.Print("{0}\t\t\t{1}", lineNo, line);
    }

    private static void PrintElapsedTime(TimeSpan elapsed)
    {
        Debug.WriteLine("Time elapsed: {0} ms", elapsed);
    }
    #endregion

    public PaddedFileSeek(FileInfo fileInfo)
    {
        // Possibly might have to check for FileExists
        int length = FindLineLength(fileInfo);
        //if (length == 0) throw new PaddedProgramException();
        LineLength = length;
        File = fileInfo;
    }

    public void CallAll(int[] lineNoArray, List<int> lineNoList)
    {
        Stopwatch sw = new Stopwatch();

        #region Seek1
        // Create new stopwatch
        sw.Start();

        Debug.Write("Seek1: ");
        // Print Header
        PrintHeader();

        Seek1(lineNoArray);

        // Stop timing
        sw.Stop();

        // Print Elapsed Time
        PrintElapsedTime(sw.Elapsed);

        sw.Reset();
        #endregion

        #region Seek2
        // Create new stopwatch
        sw.Start();

        Debug.Write("Seek2: ");
        // Print Header
        PrintHeader();

        Seek2(lineNoArray);

        // Stop timing
        sw.Stop();

        // Print Elapsed Time
        PrintElapsedTime(sw.Elapsed);

        sw.Reset();
        #endregion

        #region Seek3
        // Create new stopwatch
        sw.Start();

        Debug.Write("Seek3: ");
        // Print Header
        PrintHeader();

        Seek3(lineNoArray);

        // Stop timing
        sw.Stop();

        // Print Elapsed Time
        PrintElapsedTime(sw.Elapsed);

        sw.Reset();
        #endregion

        #region Seek4
        // Create new stopwatch
        sw.Start();

        Debug.Write("Seek4: ");

        // Print Header
        PrintHeader();

        Seek4(lineNoList);

        // Stop timing
        sw.Stop();

        // Print Elapsed Time
        PrintElapsedTime(sw.Elapsed);

        sw.Reset();
        #endregion

    }

    /// <summary>
    /// Option by Jake
    /// </summary>
    /// <param name="lineNoArray"></param>
    public void Seek1(int[] lineNoArray)
    {
        long position = 0;
        string line = null;

        Array.Sort(lineNoArray);

        using (FileStream fs = new FileStream(File.FullName, FileMode.Open, FileAccess.Read, FileShare.None))
        {
            using (StreamReader reader = new StreamReader(fs))
            {
                foreach (int lineNo in lineNoArray)
                {
                    position = (lineNo - 1) * LineLength;
                    fs.Seek(position, SeekOrigin.Begin);

                    if ((line = reader.ReadLine()) != null)
                    {
                        PrintLine(lineNo, line);
                    }

                    reader.DiscardBufferedData();
                }
            }
        }

    }

    /// <summary>
    /// option by bitxwise
    /// </summary>
    public void Seek2(int[] lineNoArray)
    {
        string line = null;
        long step = 0;

        Array.Sort(lineNoArray);

        using (FileStream fs = new FileStream(File.FullName, FileMode.Open, FileAccess.Read, FileShare.None))
        {
            // using (StreamReader reader = new StreamReader(fs))
            // If you put "using" here you will get WRONG results.
            // I would like to understand why this is.
            {
                foreach (int lineNo in lineNoArray)
                {
                    StreamReader reader = new StreamReader(fs);
                    step = (lineNo - 1) * LineLength - fs.Position;
                    fs.Position += step;

                    if ((line = reader.ReadLine()) != null)
                    {
                        PrintLine(lineNo, line);
                    }
                }
            }
        }
    }

    /// <summary>
    /// Option by Valentin Kuzub
    /// </summary>
    /// <param name="lineNoArray"></param>
    #region Seek3
    public void Seek3(int[] lineNoArray)
    {
        long position = 0; // totalPosition = 0;
        string line = null;
        int oldLineNo = 0;

        Array.Sort(lineNoArray);

        using (FileStream fs = new FileStream(File.FullName, FileMode.Open, FileAccess.Read, FileShare.None))
        {
            using (StreamReader reader = new StreamReader(fs))
            {
                foreach (int lineNo in lineNoArray)
                {
                    position = (lineNo - oldLineNo - 1) * LineLength;
                    fs.Seek(position, SeekOrigin.Current);
                    line = ReadLine(fs, LineLength);
                    PrintLine(lineNo, line);
                    oldLineNo = lineNo;

                }
            }
        }

    }

    #region Required Private methods
    /// <summary>
    /// Currently only used by Seek3
    /// </summary>
    /// <param name="stream"></param>
    /// <param name="length"></param>
    /// <returns></returns>
    private static string ReadLine(FileStream stream, int length)
    {
        byte[] bytes = new byte[length];
        stream.Read(bytes, 0, length);
        return new string(Encoding.UTF8.GetChars(bytes));
    }
    #endregion
    #endregion

    /// <summary>
    /// Option by Ritch Melton
    /// </summary>
    /// <param name="lineNoArray"></param>
    #region Seek4
    public void Seek4(List<int> lineNoList)
    {
        lineNoList.Sort();

        using (var fs = new FileStream(File.FullName, FileMode.Open))
        {
            lineNoList.ForEach(ln => OutputData(fs, ln));
        }

    }

    #region Required Private methods
    private void OutputData(FileStream fs, int lineNumber)
    {
        var offset = (lineNumber - 1) * LineLength;

        fs.Seek(offset, SeekOrigin.Begin);

        var data = new byte[LineLength];
        fs.Read(data, 0, LineLength);

        var text = DecodeData(data);
        PrintLine(lineNumber, text);
    }

    private static string DecodeData(byte[] data)
    {
        var encoding = new UTF8Encoding();
        return encoding.GetString(data);
    }

    #endregion

    #endregion
}



static class TestPaddedFileSeek
{
    public static void CallPaddedFileSeek()
    {
        const int arrayLenght = 5000;
        int[] lineNoArray = new int[arrayLenght];
        List<int> lineNoList = new List<int>();
        Random random = new Random();
        int lineNo;
        string fileName;


        fileName = @"C:\Temp\Temp.txt";

        PaddedFileSeek seeker = new PaddedFileSeek(new FileInfo(fileName));

        for (int n = 0; n < 25; n++)
        {
            Debug.Print("Loop no: {0}", n + 1);

            for (int i = 0; i < arrayLenght; i++)
            {
                lineNo = random.Next(1, arrayLenght);

                lineNoArray[i] = lineNo;
                lineNoList.Add(lineNo);
            }

            seeker.CallAll(lineNoArray, lineNoList);

            lineNoList.Clear();

            Debug.Print("");
        }
    }
}

`

4

5 回答 5

3

将其放在 的内部循环中SeekMethod(int[] lineNos)

position = (lineNo - 1) * LineLength;
fs.Seek(position, SeekOrigin.Begin);
reader.DiscardBufferedData();

问题是您的position变量会根据其先前的值进行更改,并StreamReader维护一个缓冲区,因此您需要在更改流位置时清除缓冲的数据。

于 2011-03-05T03:45:16.053 回答
3

我对您的预期位置感到困惑,第 5 行在第 30 和第 45 位置,第 4 行在第 15 行,第 3 行在第 30 位?

这是读取逻辑的核心:

    var offset = (lineNumber - 1) * LineLength;

    fs.Seek(offset, SeekOrigin.Begin);

    var data = new byte[LineLength];
    fs.Read(data, 0, LineLength);

    var text = DecodeData(data);
    Debug.Print("{0,-12}{1,-16}{2}", lineNumber, offset, text);

完整样本在这里:

class PaddedFileSearch
{
    public int LineLength { get; private set; }
    public FileInfo File { get; private set; }

    public PaddedFileSearch(FileInfo fileInfo)
    {
        var length = FindLineLength(fileInfo);
        //if (length == 0) throw new PaddedProgramException();
        LineLength = length;
        File = fileInfo;
    }

    private static int FindLineLength(FileInfo fileInfo)
    {
        using (var reader = new StreamReader(fileInfo.FullName))
        {
            string line;
            if ((line = reader.ReadLine()) != null)
            {
                var length = line.Length + 2;
                return length;
            }
        }

        return 0;
    }

    public void SeekMethod(List<int> lineNumbers)
    {

        Debug.Print("");
        Debug.Print("Line No\t\tPosition\t\tLine");
        Debug.Print("-------\t\t--------\t\t-----------------");

        lineNumbers.Sort();

        using (var fs = new FileStream(File.FullName, FileMode.Open))
        {
            lineNumbers.ForEach(ln => OutputData(fs, ln));
        }
    }

    private void OutputData(FileStream fs, int lineNumber)
    {
        var offset = (lineNumber - 1) * LineLength;

        fs.Seek(offset, SeekOrigin.Begin);

        var data = new byte[LineLength];
        fs.Read(data, 0, LineLength);

        var text = DecodeData(data);
        Debug.Print("{0,-12}{1,-16}{2}", lineNumber, offset, text);
    }

    private static string DecodeData(byte[] data)
    {
        var encoding = new UTF8Encoding();
        return encoding.GetString(data);
    }
}

class Program
{
    static void Main(string[] args)
    {
        var seeker = new PaddedFileSearch(new FileInfo(@"D:\Desktop\Test.txt"));

        Debug.Print("File Line length: {0}", seeker.LineLength);

        seeker.SeekMethod(new List<int> { 5, 3, 4 });
        seeker.SeekMethod(new List<int> { 5, 3 });
    }
}
于 2011-03-05T03:53:40.407 回答
1

对于第一个 lineno,您的位置是绝对的,对于进一步的 lineno,您的位置是绝对的

仔细看这里,看看你得到的实际结果

position = (lineNo - 1) * LineLength - position;
fs.Seek(position, SeekOrigin.Current);

对于值 3,4,5,您会得到数字 30,15,45,而很明显,如果您使用相对位置,它应该是 30,15,15,因为行长是 15 ,如果您的读取方法执行 SEEK,则为 30,0,0作为副作用,就像 filestream.Read 一样。并且您的测试输出是意外正确的(尽管仅适用于字符串值,而不是位置),您应该不使用测试序列并更仔细地查看位置值,以查看与显示的字符串和位置值没有联系。

实际上,您的 StreamReader 忽略了进一步的fs.Seek调用,只是逐行读取 =)

这是 3 5 9 输入的结果:)

Line No         Position                Line
-------         --------                -----------------
3                       30                              0003|100!2500
5                       30                              0004|100!2500
9                       90                              0005|100!2500

我相信以下最接近您想要实现的新功能

private static string ReadLine(FileStream stream, int length)
        {
             byte[] bytes= new byte[length];
             stream.Read(bytes, 0, length);
             return new string(Encoding.UTF8.GetChars(bytes));  
        }

和新的循环代码

int oldLine = 0;
    using (FileStream fs = new FileStream(FileName, FileMode.Open, FileAccess.Read, FileShare.None))
    {
            foreach (int lineNo in lineNos)
            {
                position = (lineNo - oldLine -1) * LineLength;
                fs.Seek(position, SeekOrigin.Current);
                line = ReadLine(fs, LineLength);
                Console.WriteLine("{0}\t\t\t{1}\t\t\t\t{2}", lineNo, position, line);
                oldLine = lineNo;
            }
    }

注意 nowstream.Read函数等价于附加stream.Seek (Length)

新的正确输出和逻辑位置变化

Line No         Position                Line
-------         --------                -----------------
3                       30                              0003|100!2500    
4                       0                               0004|100!2500    
5                       0                               0005|100!2500

Line No         Position                Line
-------         --------                -----------------
3                       30                              0003|100!2500  
5                       15                              0005|100!2500

PS:奇怪的是你认为 001: line is 1st line not 0th ..-1如果你使用程序员计数方法,整个可以被删除=)

于 2011-03-05T03:32:27.133 回答
1

我不会说问题在于您手动管理位置值的努力,而是 StreamReader.ReadLine 更改了流的位置值。如果您单步执行您的代码并监控您的本地值,您将在每次调用 ReadLine 后看到流的位置发生变化(第一次调用后变为 148)。

编辑

最好直接更改流的位置而不是使用 Seek

public void SeekMethod(int[] lineNos)
{
    string line = null;
    long step;

    Array.Sort(lineNos);

    Debug.Print("");
    Debug.Print("Line No\t\tPosition\t\tLine");
    Debug.Print("-------\t\t--------\t\t-----------------");

    using (FileStream fs = new FileStream(FileName, FileMode.Open, FileAccess.Read, FileShare.None))
    {
        foreach (int lineNo in lineNos)
        {
            StreamReader reader = new StreamReader(fs);
            step = (lineNo - 1) * LineLength - fs.Position;
            fs.Position += step;

            if ((line = reader.ReadLine()) != null) {
                Debug.Print("{0}\t\t\t{1}\t\t\t\t{2}", lineNo, step, line);
            }
        }
    }
}
于 2011-03-05T03:54:22.123 回答
0

问题是您正在手动跟踪位置,但没有考虑到在您阅读该行之后实际文件位置将比该行更远的事实。所以你需要减去额外的阅读——但前提是它确实发生了。

如果您真的想这样做,那么不要保留,而是position获取实际的文件位置;或从给定的行号直接计算绝对文件位置,而不是从当前文件偏移量计算绝对文件位置。

于 2011-03-05T03:30:22.177 回答