使用流式阅读器读取文本文件。
using (StreamReader sr = new StreamReader(FileName, Encoding.Default))
{
string line = sr.ReadLine();
}
我想强制该行分隔符不应该\n
是\r
. 那么我该怎么做呢?
使用流式阅读器读取文本文件。
using (StreamReader sr = new StreamReader(FileName, Encoding.Default))
{
string line = sr.ReadLine();
}
我想强制该行分隔符不应该\n
是\r
. 那么我该怎么做呢?
我会实现类似乔治的答案,但作为一种扩展方法,避免一次加载整个文件(未经测试,但类似这样):
static class ExtensionsForTextReader
{
public static IEnumerable<string> ReadLines (this TextReader reader, char delimiter)
{
List<char> chars = new List<char> ();
while (reader.Peek() >= 0)
{
char c = (char)reader.Read ();
if (c == delimiter) {
yield return new String(chars.ToArray());
chars.Clear ();
continue;
}
chars.Add(c);
}
}
}
然后可以像这样使用:
using (StreamReader sr = new StreamReader(FileName, Encoding.Default))
{
foreach (var line in sr.ReadLines ('\n'))
Console.WriteLine (line);
}
string text = sr.ReadToEnd();
string[] lines = text.Split('\r');
foreach(string s in lines)
{
// Consume
}
我喜欢@Pete 给出的答案。我只想提交一个小小的修改。这将允许您传递一个字符串分隔符,而不仅仅是一个字符:
using System;
using System.IO;
using System.Collections.Generic;
internal static class StreamReaderExtensions
{
public static IEnumerable<string> ReadUntil(this StreamReader reader, string delimiter)
{
List<char> buffer = new List<char>();
CircularBuffer<char> delim_buffer = new CircularBuffer<char>(delimiter.Length);
while (reader.Peek() >= 0)
{
char c = (char)reader.Read();
delim_buffer.Enqueue(c);
if (delim_buffer.ToString() == delimiter || reader.EndOfStream)
{
if (buffer.Count > 0)
{
if (!reader.EndOfStream)
{
yield return new String(buffer.ToArray()).Replace(delimiter.Substring(0, delimiter.Length - 1), string.Empty);
}
else
{
buffer.Add(c);
yield return new String(buffer.ToArray());
}
buffer.Clear();
}
continue;
}
buffer.Add(c);
}
}
private class CircularBuffer<T> : Queue<T>
{
private int _capacity;
public CircularBuffer(int capacity)
: base(capacity)
{
_capacity = capacity;
}
new public void Enqueue(T item)
{
if (base.Count == _capacity)
{
base.Dequeue();
}
base.Enqueue(item);
}
public override string ToString()
{
List<String> items = new List<string>();
foreach (var x in this)
{
items.Add(x.ToString());
};
return String.Join("", items);
}
}
}
根据文档:
http://msdn.microsoft.com/en-us/library/system.io.streamreader.readline.aspx
行定义为字符序列后跟换行符 ("\n")、回车符 ("\r") 或回车符后紧跟换行符 ("\r\n")。
默认情况下,StreamReader ReadLine 方法将通过 \n 或 \r 来识别一行
这是对 sovemp 答案的改进。抱歉,我很想发表评论,尽管我的名声不允许我这样做。此改进解决了 2 个问题:
当流中的最后一个字符等于分隔符时,函数将错误地返回包含分隔符的字符串。
using System;
using System.IO;
using System.Collections.Generic;
internal static class StreamReaderExtensions
{
public static IEnumerable<string> ReadUntil(this StreamReader reader, string delimiter)
{
List<char> buffer = new List<char>();
CircularBuffer<char> delim_buffer = new CircularBuffer<char>(delimiter.Length);
while (reader.Peek() >= 0)
{
char c = (char)reader.Read();
delim_buffer.Enqueue(c);
if (delim_buffer.ToString() == delimiter || reader.EndOfStream)
{
if (buffer.Count > 0)
{
if (!reader.EndOfStream)
{
buffer.Add(c);
yield return new String(buffer.ToArray()).Substring(0, buffer.Count - delimeter.Length);
}
else
{
buffer.Add(c);
if (delim_buffer.ToString() != delimiter)
yield return new String(buffer.ToArray());
else
yield return new String(buffer.ToArray()).Substring(0, buffer.Count - delimeter.Length);
}
buffer.Clear();
}
continue;
}
buffer.Add(c);
}
}
private class CircularBuffer<T> : Queue<T>
{
private int _capacity;
public CircularBuffer(int capacity)
: base(capacity)
{
_capacity = capacity;
}
new public void Enqueue(T item)
{
if (base.Count == _capacity)
{
base.Dequeue();
}
base.Enqueue(item);
}
public override string ToString()
{
List<String> items = new List<string>();
foreach (var x in this)
{
items.Add(x.ToString());
};
return String.Join("", items);
}
}
}
我需要一个读取到“\r\n”的解决方案,并且不会在“\n”处停止。jp1980 的解决方案有效,但在大文件上速度极慢。因此,我将 Mike Sackton 的解决方案转换为读取,直到找到指定的字符串。
public static string ReadLine(this StreamReader sr, string lineDelimiter)
{
StringBuilder line = new StringBuilder();
var matchIndex = 0;
while (sr.Peek() > 0)
{
var nextChar = (char)sr.Read();
line.Append(nextChar);
if (nextChar == lineDelimiter[matchIndex])
{
if (matchIndex == lineDelimiter.Length - 1)
{
return line.ToString().Substring(0, line.Length - lineDelimiter.Length);
}
matchIndex++;
}
else
{
matchIndex = 0;
//did we mistake one of the characters as the delimiter? If so let's restart our search with this character...
if (nextChar == lineDelimiter[matchIndex])
{
if (matchIndex == lineDelimiter.Length - 1)
{
return line.ToString().Substring(0, line.Length - lineDelimiter.Length);
}
matchIndex++;
}
}
}
return line.Length == 0
? null
: line.ToString();
}
它是这样称呼的……
using (StreamReader reader = new StreamReader(file))
{
string line;
while((line = reader.ReadLine("\r\n")) != null)
{
Console.WriteLine(line);
}
}
您必须自己逐字节解析流并处理拆分,或者您需要使用在 /r、/n 或 /r/n 上拆分的默认 ReadLine 行为。
如果你想逐字节解析流,我会使用类似下面的扩展方法:
public static string ReadToChar(this StreamReader sr, char splitCharacter)
{
char nextChar;
StringBuilder line = new StringBuilder();
while (sr.Peek() > 0)
{
nextChar = (char)sr.Read();
if (nextChar == splitCharacter) return line.ToString();
line.Append(nextChar);
}
return line.Length == 0 ? null : line.ToString();
}
即使您说“使用 StreamReader”,因为您也说过“我的情况,文件可以有大量记录......”,我建议您尝试 SSIS。它非常适合您尝试做的事情。您可以处理非常大的文件并轻松指定行/列分隔符。
此代码片段将从文件中读取一行,直到遇到“\n”。
using (StreamReader sr = new StreamReader(path))
{
string line = string.Empty;
while (sr.Peek() >= 0)
{
char c = (char)sr.Read();
if (c == '\n')
{
//end of line encountered
Console.WriteLine(line);
//create new line
line = string.Empty;
}
else
{
line += (char)sr.Read();
}
}
}
因为此代码逐个字符读取,所以它可以处理任何长度的文件,而不受可用内存的限制。