0

我偶尔会得到不完全干净的数据,并且在运行时我会收到错误消息,因为数据与预期的类型不匹配。例如,有时数据在应该有一个 int 的地方有一个字符串,或者在应该有一个日期的地方有一个 int。

有没有办法先扫描数据中的坏数据,这样我就可以一次修复它,而不是在运行时找出并迭代地修复它?

这是我的有效代码:

class TestScore{
    public string Name;
    public int Age;
    public DateTime Date;
    public DateTime Time;
    public double Score;
}

//read data
var Data = File.ReadLines(FilePath).Select(line => line.Split('\t')).ToArray();

//select data
var query = from x in Data                     
select new { Name = x[3], Age = x[1], Date = x[2], Time = x[5], Score = x[7] };

//create List and put data into List
List<TestScore> Results = new List<TestScore>();

for (int i = 0; i < query.Count; i++)
{
       TestScore TS = new TestScore();

       TS.Name = query[i].Name;
       TS.Age = query[i].Age;
       TS.Date = query[i].Date;
       TS.Time = query[i].Time;
       TS.Score = query[i].Score;

       Results.Add(TS);
}
4

2 回答 2

2

有没有办法先扫描数据中的坏数据,这样我就可以一次修复它,而不是在运行时找出并迭代地修复它?

扫描运行时操作。但是,实现一个解决方案是相当简单的,它可以为您提供足够的信息来“一次修复所有问题”。

以下代码显示了用于验证整个文件的模式,除非完全成功,否则不会尝试加载任何数据。

如果失败,则返回遇到的所有错误的集合。

internal sealed class ParseStatus
{
    internal bool IsSuccess;
    internal IReadOnlyList<string> Messages;
}

private ParseStatus Load()
{
    string filePath = "foo";

    var data = File.ReadLines( filePath ).Select( line => line.Split( '\t' ) ).ToArray();
    var results = from x in data
                    select new { Name = x[3], Age = x[1], Date = x[2], Time = x[5], Score = x[7] };

    var errors = new List<string>();
    int row = 0;

    // first pass: look for errors by testing each value
    foreach( var line in results )
    {
        row++;

        int dummy;
        if( !int.TryParse( line.Age, out dummy ) )
        {
            errors.Add( "Age couldn't be parsed as an int on line " + row );
        }

        // etc...use exception-free checks on each property
    }

    if( errors.Count > 0 )
    {
        // quit, and return errors list
        return new ParseStatus { IsSuccess = false, Messages = errors };
    }

    // otherwise, it is safe to load all rows

    // TODO: second pass: load the data

    return new ParseStatus { IsSuccess = true };
}
于 2013-10-06T02:33:07.213 回答
2

为了在运行时不发现错误,我能想到的最好的办法是在程序运行之前手动更正数据..

但是当我们尝试做一些建设性的事情时,我认为使用静态只读字段来指示数据错误会很有帮助。以下是一个简单的示例,它不包含失败的项目,您可能需要在进行一些高级处理时对其进行修改。

public partial class TestScore {
    public static TestScore Parse(String plainText) {
        var strings=plainText.Split('\t');
        var result=new TestScore();

        if(
            strings.Length<5
            ||
            !double.TryParse(strings[4], out result.Score)
            ||
            !DateTime.TryParse(strings[3], out result.Time)
            ||
            !DateTime.TryParse(strings[2], out result.Date)
            ||
            !int.TryParse(strings[1], out result.Age)
            )
            return TestScore.Error;

        result.Name=strings[0];
        return result;
    }

    public String Name;
    public int Age;
    public DateTime Date;
    public DateTime Time;
    public double Score;

    public static readonly TestScore Error=new TestScore();
}

public static partial class TestClass {
    public static void TestMethod() {
        var path=@"some tab splitted file";

        var lines=File.ReadAllLines(path);

        var format=""
            +"Name: {0}; Age: {1}; "
            +"Date: {2:yyyy:MM:dd}; Time {3:hh:mm}; "
            +"Score: {4}";

        var list=(
            from line in lines
            where String.Empty!=line
            let result=TestScore.Parse(line)
            where TestScore.Error!=result
            select result).ToList();

        foreach(var item in list) {
            Console.WriteLine(
                format,
                item.Name, item.Age, item.Date, item.Time, item.Score
                );
        }
    }
}
于 2013-10-06T04:11:19.727 回答