0

这只是我需要格式化的数据的一个示例。

第一列很简单,第二列的问题。

  1. 在一列中格式化多个数据字段的最佳方法是什么?
  2. 如何解析这些数据?

重要*:第二列需要包含多个值,如下例所示

Name       Details

Alex       Age:25
           Height:6
           Hair:Brown
           Eyes:Hazel
4

3 回答 3

1

一个 csv 应该看起来像这样:

Name,Age,Height,Hair,Eyes
Alex,25,6,Brown,Hazel

每个单元格与其相邻单元格之间应恰好用一个逗号分隔。

您可以使用简单的正则表达式重新格式化它,用逗号替换某些换行符和非换行符空格(您可以轻松找到每个块,因为它在两列中都有值)。

于 2012-05-08T12:07:59.270 回答
0

CSV 文件通常使用逗号作为字段分隔符和 CR 作为行分隔符来定义。您在第二列中使用 CR,这会导致问题。您需要重新格式化第二列以在多个值之间使用其他形式的分隔符。一个常见的备用分隔符是 | (管道)字符。

您的格式将如下所示:Alex,Age:25|Height:6|Hair:Brown|Eyes:Hazel

在您的解析中,您将首先解析逗号分隔的字段(这将返回两个值),然后将第二个字段解析为管道分隔。

于 2012-05-08T11:38:59.957 回答
0

This is an interesting one - it can be quite difficult to parse specific format files which is why people often write specific classes to deal with them. More conventional file formats like CSV, or other delimited formats are [more] easy to read because they are formatted in a similar way.

A problem like the above can be addressed in the following way:

1) What should the output look like?

In your instance, and this is just a guess, but I believe you are aiming for the following:

Name, Age, Height, Hair, Eyes
Alex, 25, 6, Brown, Hazel

In which case, you have to parse out this information based on the structure above. If it's repeated blocks of text like the above then we can say the following:

a. Every person is in a block starting with Name Details

b. The name value is the first text after Details, with the other columns being delimited in the format Column:Value

However, you might also have sections with addtional attributes, or attributes that are missing if the original input was optional, so tracking the column and ordinal would be useful too.

So one approach might look like the following:

public void ParseFile(){

        String currentLine;

        bool newSection = false;

        //Store the column names and ordinal position here.
        List<String> nameOrdinals = new List<String>();
        nameOrdinals.Add("Name"); //IndexOf == 0

        Dictionary<Int32, List<String>> nameValues = new Dictionary<Int32 ,List<string>>(); //Use this to store each person's details

        Int32 rowNumber = 0;

        using (TextReader reader = File.OpenText("D:\\temp\\test.txt"))
        {

            while ((currentLine = reader.ReadLine()) != null) //This will read the file one row at a time until there are no more rows to read
            {

                string[] lineSegments = currentLine.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);

                if (lineSegments.Length == 2 && String.Compare(lineSegments[0], "Name", StringComparison.InvariantCultureIgnoreCase) == 0
                    && String.Compare(lineSegments[1], "Details", StringComparison.InvariantCultureIgnoreCase) == 0) //Looking for a Name  Details Line - Start of a new section
                {
                    rowNumber++;
                    newSection = true;
                    continue;
                }

                if (newSection && lineSegments.Length > 1) //We can start adding a new person's details - we know that 
                {
                    nameValues.Add(rowNumber, new List<String>());
                    nameValues[rowNumber].Insert(nameOrdinals.IndexOf("Name"), lineSegments[0]);

                    //Get the first column:value item
                    ParseColonSeparatedItem(lineSegments[1], nameOrdinals, nameValues, rowNumber);

                    newSection = false;
                    continue;
                }

                if (lineSegments.Length > 0 && lineSegments[0] != String.Empty) //Ignore empty lines
                {
                    ParseColonSeparatedItem(lineSegments[0], nameOrdinals, nameValues, rowNumber);
                }

            }
        }


        //At this point we should have collected a big list of items. We can then write out the CSV. We can use a StringBuilder for now, although your requirements will
        //be dependent upon how big the source files are.

        //Write out the columns

        StringBuilder builder = new StringBuilder();

        for (int i = 0; i < nameOrdinals.Count; i++)
        {
            if(i == nameOrdinals.Count - 1)
            {
                builder.Append(nameOrdinals[i]);
            }
            else
            {
                builder.AppendFormat("{0},", nameOrdinals[i]);
            }
        }

        builder.Append(Environment.NewLine);


        foreach (int key in nameValues.Keys)
        {
            List<String> values = nameValues[key];

            for (int i = 0; i < values.Count; i++)
            {
                if (i == values.Count - 1)
                {
                    builder.Append(values[i]);
                }
                else
                {
                    builder.AppendFormat("{0},", values[i]);
                }
            }

            builder.Append(Environment.NewLine);

        }

        //At this point you now have a StringBuilder containing the CSV data you can write to a file or similar




    }


    private void ParseColonSeparatedItem(string textToSeparate, List<String> columns, Dictionary<Int32, List<String>> outputStorage, int outputKey)
    {

        if (String.IsNullOrWhiteSpace(textToSeparate)) { return; }

        string[] colVals = textToSeparate.Split(new[] { ":" }, StringSplitOptions.RemoveEmptyEntries);

        List<String> outputValues = outputStorage[outputKey];

        if (!columns.Contains(colVals[0]))
        {
            //Add the column to the list of expected columns. The index of the column determines it's index in the output
            columns.Add(colVals[0]);

        }

        if (outputValues.Count < columns.Count)
        {
            outputValues.Add(colVals[1]);
        }
        else
        {
            outputStorage[outputKey].Insert(columns.IndexOf(colVals[0]), colVals[1]); //We append the value to the list at the place where the column index expects it to be. That way we can miss values in certain sections yet still have the expected output
        }
    }

After running this against your file, the string builder contains:

"Name,Age,Height,Hair,Eyes\r\nAlex,25,6,Brown,Hazel\r\n"

Which matches the above (\r\n is effectively the Windows new line marker)

This approach demonstrates how a custom parser might work - it's purposefully over verbose as there is plenty of refactoring that could take place here, and is just an example.

Improvements would include:

1) This function assumes there are no spaces in the actual text items themselves. This is a pretty big assumption and, if wrong, would require a different approach to parsing out the line segments. However, this only needs to change in one place - as you read a line at a time, you could apply a reg ex, or just read in characters and assume that everything after the first "column:" section is a value, for example.

2) No exception handling

3) Text output is not quoted. You could test each value to see if it's a date or number - if not, wrap it in quotes as then other programs (like Excel) will attempt to preserve the underlying datatypes more effectively.

4) Assumes no column names are repeated. If they are, then you have to check if a column item has already been added, and then create an ColName2 column in the parsing section.

于 2012-05-08T12:19:43.480 回答