1

我正在尝试逐行读取文本文件并从多行创建一行,直到读入的行末尾有 \r\n 。我的数据如下所示:

BusID|Comment1|Text\r\n
1010|"Cuautla, Inc. d/b/a 3 Margaritas VIII\n
State Lic. #40428210000   City Lic.#4042821P\n
9/26/14      9/14/14 - 9/13/15    $175.00\n
9/20/00    9/14/00 - 9/13/01    $575.00 New License"\r\n
1020|"7-Eleven Inc., dba 7-Eleven Store #20638\n
State Lic. #24111110126; City Lic. #2411111126P\n
SEND ISSUED LICENSES TO DALLAS, TX\r\n

我希望数据看起来像这样:

BusID|Comment1|Text\r\n
1010|"Cuautla, Inc. d/b/a 3 Margaritas VIII State Lic. #40428210000   City Lic.#4042821P 9/26/14      9/14/14 - 9/13/15    $175.00 9/20/00    9/14/00 - 9/13/01    $575.00 New License"\r\n
1020|"7-Eleven Inc., dba 7-Eleven Store #20638 State Lic. #24111110126; City Lic. #2411111126P SEND ISSUED LICENSES TO DALLAS, TX\r\n

我的代码是这样的:

FileStream fsFileStream = new FileStream(strInputFileName, FileMode.Open, 
FileAccess.Read, FileShare.ReadWrite);

using (StreamReader srStreamRdr = new StreamReader(fsFileStream))
{
    while ((strDataLine = srStreamRdr.ReadLine()) != null && !blnEndOfFile)
    {
        //code evaluation here
    }

我努力了:

if (strDataLine.EndsWith(Environment.NewLine))
{
    blnEndOfLine = true;
}

if (strDataLine.Contains(Environment.NewLine))
{
    blnEndOfLine = true;
}

这些在字符串变量的末尾看不到任何内容。有没有办法告诉我真正的行尾,以便我可以将这些行组合成一行?我应该以不同的方式阅读文件吗?

4

3 回答 3

0

如果您发布的内容正是文件中的内容。意味着 \r\n 确实是写的,您可以使用以下内容来取消它们:

strDataLine.Replace("\\r", "\r").Replace("\\n", "\n");

这将确保您现在可以使用Environment.NewLine以下方式进行比较:

if (strDataLine.Replace("\\r", "\r").Replace("\\n", "\n").EndsWith(Environment.NewLine))
{
    blnEndOfLine = true;
}
于 2017-03-11T18:05:36.987 回答
0

您可以通过File.ReadAllText(path)以下方式调用并解析所有文本:

            string input =  File.ReadAllText(your_file_path);
            string output = string.Empty;
            input.Split(new[] { Environment.NewLine } , StringSplitOptions.RemoveEmptyEntries).
                Skip(1).ToList().
                ForEach(x =>
                {
                    output += x.EndsWith("\\r\\n") ? x + Environment.NewLine 
                                                   : x.Replace("\\n"," ");
                });
于 2017-03-11T18:23:39.593 回答
0

您不能使用 StringReader 的 ReadLine 方法,因为每种换行符。\r\n和都\n从输入中删除,阅读器返回一行,您永远不会知道删除的字符是 \r\n 还是只是 \n

如果文件不是很大,那么您可以尝试将所有内容加载到内存中并将自己拆分为单独的行

// Load everything in memory
string fileData = File.ReadAllText(@"D:\temp\myData.txt");

// Split on the \r\n (I don't use Environment.NewLine because it 
// respects the OS conventions and this could be wrong in this context
string[] lines = fileData.Split(new string[] { "\r\n"}, StringSplitOptions.RemoveEmptyEntries);

// Now replace the remaining \n with a space 
lines = lines.Select(x => x.Replace("\n", " ")).ToArray();

foreach(string s in lines)
   Console.WriteLine(s);

编辑
如果您的文件真的很大(就像您说的 3.5GB),那么您无法将所有内容加载到内存中,但您需要分块处理它。幸运的是,StreamReader 提供了一个名为 ReadBlock 的方法,它允许我们实现这样的代码

// Where we store the lines loaded from file
List<string> lines = new List<string>();

// Read a block of 10MB
char[] buffer = new char[1024 * 1024 * 10];
bool lastBlock = false;
string leftOver = string.Empty;

// Start the streamreader
using (StreamReader reader = new StreamReader(@"D:\temp\localtext.txt"))
{
    // We exit when the last block is reached
    while (!lastBlock)
    {
        // Read 10MB
        int loaded = reader.ReadBlock(buffer, 0, buffer.Length);

        // Exit if we have no more blocks to read (EOF)
        if(loaded == 0) break;

        // if we get less bytes than the block size then 
        // we are on the last block 
        lastBlock = (loaded != buffer.Length);

        // Create the string from the buffer
        string temp = new string(buffer, 0, loaded);

        // prepare the working string adding the remainder from the 
        // previous loop
        string current = leftOver + temp;

        // Search the last \r\n
        int lastNewLinePos = temp.LastIndexOf("\r\n");

        if (lastNewLinePos > -1)
        {
             // Prepare the working string
             current = leftOver + temp.Substring(0, lastNewLinePos + 2);

             // Save the incomplete parts for the next loop
             leftOver = temp.Substring(lastNewLinePos + 2);
        }
        // Process the lines
        AddLines(current, lines);
    }
}

void AddLines(string current, List<string> lines)
{
    var splitted = current.Split(new string[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries);
    lines.AddRange(splitted.Select(x => x.Replace("\n", " ")).ToList());
}

此代码假定您的文件始终以 \r\n 结尾,并且您始终在 10MB 的文本块中获得 \r\n。需要对您的实际数据进行更多测试。

于 2017-03-11T18:20:52.697 回答