我的问题是这个的延续:(用于从文件中读取非常大的字节数组的不同数据类型和大小的循环)
我有一个存储在文件(rawbytes.txt 或 bytes.data)中的原始字节流,我需要对其进行解析并输出到 CSV 样式的文本文件。
原始字节的输入(当读取为字符/长/整数等时)看起来像这样:
A2401028475764B241102847576511001200C...
解析后应如下所示:
输出A.txt
(Field1,Field2,Field3) - heading
A,240,1028475764
输出B.txt
(Field1,Field2,Field3,Field4,Field5) - heading
B,241,1028475765,1100,1200
输出C.txt
C,...//and so on
本质上,它是一个 hex-dump 样式的字节输入,它是连续的,没有任何行终止符或需要解析的数据之间的间隙。如上所示,数据由一个接一个的不同数据类型组成。
这是我的代码片段 - 因为在任何字段中都没有逗号,并且不需要使用“”(即 CSV 包装器),我只是使用 TextWriter 来创建 CSV 样式的文本文件,如下所示:
if (File.Exists(fileName))
{
using (BinaryReader reader = new BinaryReader(File.Open(fileName, FileMode.Open)))
{
while (reader.BaseStream.Position != reader.BaseStream.Length)
{
inputCharIdentifier = reader.ReadChar();
switch (inputCharIdentifier)
case 'A':
field1 = reader.ReadUInt64();
field2 = reader.ReadUInt64();
field3 = reader.ReadChars(10);
string strtmp = new string(field3);
//and so on
using (TextWriter writer = File.AppendText("outputA.txt"))
{
writer.WriteLine(field1 + "," + field2 + "," + strtmp); // +
}
case 'B':
//code...
我的问题是基于这样一个事实,即一些原始字节数据包含空值,这些值很难解析——因为在连续数据之间存在未知数量的空字节(或非空、异位字节)块(如果数据块没有损坏,则每个块都以 A、B 或 C 开头)。
问题
那么,尽管由于数据损坏或错误数据可能会出现错误,我如何添加默认情况或其他一些机制来继续循环?以下代码是否可行?
inputCharIdentifier = reader.ReadChar();
...
case default:
//I need to know what to add here, instead of default
//(i.e. the case when the character could not be read)
while (binReader.PeekChar() != -1)
{
filling = binReader.readByte();
//filling is a single byte
try {
fillingChar = Convert.ToChar(filling);
break;
}
catch (Exception ex) { break; }
if (fillingChar == 'A' || fillingChar == 'B')
break;
剩下的部分 - 向每个 switch case(例如'A')添加代码以继续而不停止程序 - 有没有办法在没有多个 try-catch 块的情况下做到这一点?[即代码块字符标识符是A,但是A之后的字节已损坏-在这种情况下,我需要退出循环或读取(即跳过)定义的字节数-在这里,如果消息头就知道了正确识别剩余字节。
[注意:案例 A、B 等具有不同大小的输入 - 换句话说,A 总共可能是 40 个字节,而 B 是 50 个字节。因此,使用固定大小的缓冲区,例如 inputBuf [1000] 或 [50] - 如果它们的大小都相同 - AFAIK 也不能很好地工作。]
有什么建议么?请帮忙!我对 C# 比较陌生(2 个月)...
更新:我的整个代码如下:
class Program
{
const string fileName = "rawbytes.txt";
static void Main(string[] args)
{
try
{
var program = new Program();
program.Parser();
}
catch (Exception e)
{
Console.WriteLine(e);
}
Console.ReadLine();
}
public void Parser()
{
char inputCharIdentifier = 'Z';
//only because without initializing inputCharIdentifier I ended up with an error
//note that in the real code, 'Z' is not a switch-case alphabet
//it's an "inconsequential character", i.e. i have defined it to be 'Z'
//just to avoid that error, and to avoid leaving it as a null value
ulong field1common = 0;
ulong field2common = 0;
char[] charArray = new char[10];
char char1;
char char2;
char char3;
int valint1 = 0;
int valint2 = 0;
int valint3 = 0;
int valint4 = 0;
int valint5 = 0;
int valint6 = 0;
int valint7 = 0;
double valdouble;
/*
char[] filler = new char[53];
byte[] filling = new byte[4621];
byte[] unifiller = new byte[8];
//these values above were temporary measures to manually filter through
//null bytes - unacceptable for the final program
*/
if (File.Exists(fileName))
{
using (BinaryReader reader = new BinaryReader(File.Open(fileName, FileMode.Open)))
{
while (reader.BaseStream.Position != reader.BaseStream.Length)
{
//inputCharIdentifier = reader.ReadChar();
//if (inputCharIdentifier != null)
//{
try
{
inputCharIdentifier = reader.ReadChar();
try
{
switch (inputCharIdentifier)
{
case 'A':
field1common = reader.ReadUInt64();
field2common = reader.ReadUInt64();
//unifiller = reader.ReadBytes(8);
//charArray = reader.ReadString();
//result.ToString("o");
//Console.WriteLine(result.ToString());
charArray = reader.ReadChars(10);
string charArraystr = new string(charArray);
char1 = reader.ReadChar();
valint1 = reader.ReadInt32();
valint2 = reader.ReadInt32();
valint3 = reader.ReadInt32();
valint4 = reader.ReadInt32();
using (TextWriter writer = File.AppendText("A.txt"))
{
writer.WriteLine(field1common + "," + /*result.ToString("o")*/ field2common + "," + charArraystr + "," + char1 + "," + valint1 + "," + valint2 + "," + valint3 + "," + valint4);
writer.Close();
}
break;
case 'B':
case 'C':
field1common = reader.ReadUInt64();
field2common = reader.ReadUInt64();
//charArray = reader.ReadString();
charArray = reader.ReadChars(10);
string charArraystr2 = new string(charArray);
char1 = reader.ReadChar();
valint1 = reader.ReadInt32();
valint2 = reader.ReadInt32();
using (TextWriter writer = File.AppendText("C.txt"))
{
writer.WriteLine(field1common + "," + result2.ToString("o") + "," + charArraystr2 + "," + char1 + "," + valint1 + "," + valint2);
writer.Close();
}
break;
case 'S':
//market status message
field1common = reader.ReadUInt64();
char2 = reader.ReadChar();
char3 = reader.ReadChar();
break;
case 'L':
filling = reader.ReadBytes(4);
break;
case 'D':
case 'E':
field1common = reader.ReadUInt64();
field2common = reader.ReadUInt64();
//charArray = reader.ReadString();
charArray = reader.ReadChars(10);
string charArraystr3 = new string(charArray);
//char1 = reader.ReadChar();
valint1 = reader.ReadInt32();
valint2 = reader.ReadInt32();
valint5 = reader.ReadInt32();
valint7 = reader.ReadInt32();
valint6 = reader.ReadInt32();
valdouble = reader.ReadDouble();
using (TextWriter writer = File.AppendText("D.txt"))
{
writer.WriteLine(field1common + "," + result3.ToString("o") + "," + charArraystr3 + "," + valint1 + "," + valint2 + "," + valint5 + "," + valint7 + "," + valint6 + "," + valdouble);
writer.Close();
}
break;
}
}
catch (Exception ex)
{
Console.WriteLine("Parsing didn't work");
Console.WriteLine(ex.ToString());
break;
}
}
catch (Exception ex)
{
Console.WriteLine("Here's why the character read attempt didn't work");
Console.WriteLine(ex.ToString());
continue;
//continue;
}
//}
}
}
}
}
我收到的错误如下:
Here's why the character read attempt didn't work
System.ArgumentException: The output char buffer is too small to contain the decoded characters, encoding 'Unicode (UTF-8)' fallback 'System.Text.DecoderReplacementFallback'.
Parameter name: chars
at System.Text.Encoding.ThrowCharsOverflow()
at System.Text.Encoding.ThrowCharsOverflow(DecoderNLS decoder, Boolean nothingDecoded)
at System.Text.UTF8Encoding.GetChars(Byte* bytes, Int32 byteCount, Char* chars, Int32 charCount, DecoderNLS baseDecoder)
at System.Text.DecoderNLS.GetChars(Byte* bytes, Int32 byteCount, Char* chars, Int32 charCount, Boolean flush)
at System.Text.DecoderNLS.GetChars(Byte[] bytes, Int32 byteIndex, Int32 byteCount, Char[] chars, Int32 charIndex, Boolean flush)
at System.Text.DecoderNLS.GetChars(Byte[] bytes, Int32 byteIndex, Int32 byteCount, Char[] chars, Int32 charIndex)
at System.IO.BinaryReader.InternalReadOneChar()
at System.IO.BinaryReader.Read()
at System.IO.BinaryReader.ReadChar()
at line 69: i.e. inputCharIdentifier = reader.ReadChar();
更新:生成上述相同错误的示例文件位于以下链接: http: //www.wikisend.com/download/106394/rawbytes.txt
请特别注意连续数据块之间的 8 个意外空字节,即使数据块标头(即inputCharIdentifier)是有效的。此类标头后面的字节数始终是不可预测的,并且通常会有所不同。我的问题是,当出现下一个可用的非损坏数据块时,我需要能够删除或跳过这种情况 - 对于示例文件,最后一个(单个)数据块发生在8 个异位空字节。
8 个空字节可以按如下方式在文件中定位:字节计数器:1056 第 2 行,第 783 列(根据 Notepad++)
问题的症结在于 8 个空字节可以是任意大小 - 3、7、15、50 等。它始终是未知的 - 作为数据损坏的直接结果。但与“传统”数据损坏不同,即固定数量的字节,比如 50,在可能不可读的数据块内,因此可以跳过(按确切的字节数) - 我面临的数据损坏包括有效数据块之间的未知字节数。