0

I am trying to sort a txt file through 3rd column in c#. I can see 4 rows beneath my columns but the text file contains a lot more rows. I also need to get one more functionality working and that is removing duplicate data from the display. {If data in the first and second column is the same as another row then remove one instance from the display. data is case sensitive i.e. broc is not the same as BRoc}. Help appreciated. My code is as follows. Please note that the txt file is a TSV and not a CSV.

     using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.IO;
namespace ConsoleApplication4
{
    class Program
    {
        static void Main(string[] args)
        {
            var records = (from l in File.ReadAllLines(@"d:\data\542112107\Desktop\project 1\Project1\Project1\bin\Debug\instance_test.txt")
                           let pieces = l.Split('\t')
                           select new { Col1 = pieces[0], Col2 = pieces[1], Col3 = pieces[2], Col4 = pieces[3] })



                .Skip(1)


                .Distinct()
                .OrderBy(c => c.Col3);

            for (int i = 0; i < 99; i++)
            {

            } foreach (var r in records)
            { Console.WriteLine("{0}, {1}, {2},{3}", r.Col1, r.Col2, r.Col3, r.Col4); }

            Console.WriteLine();
            Console.WriteLine("Done");
            Console.ReadLine();
        }
    }
}

Here is the sample input:(\n is new line)

Heading 1   Heading 2     Heading3    Heading 4

ascvad3124        adfdasfData            asasffasf       adsfasfasdf 
asf123134Data      dasfasdfdasfData        Dasfasfata      asdfasdfadsf
123123fData       asdfdasfsData        asdfasdfasdf          sadvsdfdaf
4

1 回答 1

0

So the first issue is that you're not ordering on the third column's value; you're ordering on the second letter of the third column's value. Change c.Col3[1] to just c.Col3 when ordering to actually order on the third column.

Another issue is that you're grouping on the concatenation of all of the fields; this isn't particularly safe. Here "ab", "c", "d", "e" will be considered equal to "a", "bc", "d", "e". You can just call Distinct instead of GroupBy. Anonymous objects properly override Equals an GetHashCode to be based on the underlying properties, not the reference, so this will work just fine.

Next, You can use ReadLines instead of ReadAllLines to lazily read in the lines, instead of eagerly loading in all of the lines when you are only going to go and process the data lazily anyway.

于 2014-01-13T18:10:11.650 回答