c# - 解析具有动态列数的大型分隔文件

Question

当在解析文件之前列未知时，解析分隔文件的最佳方法是什么？

文件格式为 Rightmove v3 (.blm)，结构如下：

#HEADER#
Version : 3
EOF : '^'
EOR : '~'
#DEFINITION#
AGENT_REF^ADDRESS_1^POSTCODE1^MEDIA_IMAGE_00~ // can be any number of columns
#DATA#
agent1^the address^the postcode^an image~
agent2^the address^the postcode^^~      // the records have to have the same number of columns as specified in the definition, however they can be empty
etc
#END#

这些文件可能非常大，我的示例文件是 40Mb，但它们可能是几百兆字节。下面是我在意识到列是动态的之前开始编写的代码，我在阅读时打开了一个文件流，这是处理大文件的最佳方式。我不确定将每条记录放在列表中然后处理的想法是否有好处，但不知道这是否适用于如此大的文件。

List<string> recordList = new List<string>();

try
{
    using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read))
    {
        StreamReader file = new StreamReader(fs);
        string line;
        while ((line = file.ReadLine()) != null)
        {
            string[] records = line.Split('~');

            foreach (string item in records)
            {
                if (item != String.Empty)
                {
                    recordList.Add(item);
                }
            }

        }
    }
}
catch (FileNotFoundException ex)
{
    Console.WriteLine(ex.Message);
}

foreach (string r in recordList)
{
    Property property = new Property();

    string[] fields = r.Split('^');

    // can't do this as I don't know which field is the post code
    property.PostCode = fields[2];
    // etc

    propertyList.Add(property);
}

关于如何更好地做到这一点的任何想法？如果有帮助，那就是 C# 3.0 和 .Net 3.5。

谢谢，

安妮莉

score 1 · Accepted Answer

你可以通过几种方式做到这一点。

如果对象上的属性与数据文件中的列同名，则可以使用反射来确定哪些列应与哪些属性匹配。
如果您的对象上的属性具有不同的名称，那么您可以编写一个自定义映射模式，该模式会显示“对于列 X，分配给属性 Y”。
您可以为您的对象属性创建自定义属性，指示它们映射到哪个列名，并使用反射来读取这些属性。

所有这些步骤都假定数据文件中的列名与它们所代表的数据相同（即，ADDRESS_1 始终是“地址行一”数据的列名）。

score 1 · Accepted Answer

如果您可以在开头删除一些行（标题内容和#xxx# 行），那么它只是一个带有^分隔符的 csv 文件，因此任何CSV 阅读器类都可以解决问题。

c# - 解析具有动态列数的大型分隔文件

2 回答 2

Related

Reference