1

我读了一个文件,文件格式是这个
INPUT FILE FORMAT

        id          PosScore  NegScore       Word                             SynSet   

        00002098    0         0.75           unable#1                         (usually followed by `to') not having the necessary means or skill or know-how; "unable to get to town without a car"; "unable to obtain funds"
        00002312    0.23      0.43           dorsal#2 abaxial#1               facing away from the axis of an organ or organism; "the abaxial surface of a leaf is the underside or side facing away from the stem"
        00002527    0.14      0.26           ventral#2 adaxial#1              nearest to or facing toward the axis of an organ or organism; "the upper side of a leaf is known as the adaxial surface"
        00002730    0.45      0.32           acroscopic#1                     facing or on the side toward the apex
        00002843    0.91      0.87           basiscopic#1                     facing or on the side toward the base
        00002956    0.43      0.73           abducting#1 abducent#1           especially of muscles; drawing away from the midline of the body or from an adjacent part
        00003131    0.15      0.67           adductive#1 adducting#1 adducent#1  especially of muscles; bringing together or drawing toward the midline of the body or toward an adjacent part    
in this file     

在这个文件中,Synset 列应该被删除,第二件事如果 Word 列有多个单词,则 id、PosScore、NegScore 将根据一行中的单词重复重复,但 id、posScore、NegScore 将相同。我想要上述文件
OUTPUT的以下输出

 id         PosScore      NegScore              Word     
00002098    0             0.75              unable#1    
00002312    0.23          0.43               dorsal#2    
00002312    0.23          0.43               abaxial#1       
00002527    0.14          0.26               ventral#2    
00002527    0.14          0.26               adaxial#1     
00002730    0.45          0.32               acroscopic#1    
00002843    0.91          0.87               basiscopic#1    
00002956    0.43          0.73               abducting#1    
00002956    0.43          0.73               abducent#1    
00003131    0.15          0.67               adductive#1    
00003131    0.15          0.67               adducting#1    
00003131    0.15          0.67               adducent#1    

我写了下面的代码,但它给出了意想不到的结果。

 TextWriter tw = new StreamWriter("D:\\output.txt");    
 private void button1_Click(object sender, EventArgs e)
        {

                StreamReader reader = new StreamReader(@"C:\Users\Zia Ur Rehman\Desktop\records.txt");
                string line;
                String lines = "";
                while ((line = reader.ReadLine()) != null)
                {

                    String[] str = line.Split('\t');

                    String[] words = str[4].Split(' ');
                    for (int k = 0; k < words.Length; k++)
                    {
                        for (int i = 0; i < str.Length; i++)
                        {
                            if (i + 1 != str.Length)
                            {
                                lines = lines + str[i] + ",";
                            }
                            else
                            {
                                lines = lines + words[k] + "\r\n";

                            }
                        }
                    }
                }
            tw.Write(lines);
            tw.Close();
            reader.Close();    
        } 

此代码给出以下错误的结果

00002098,0,0.75,unable#1,unable#1
00002312,0,0,dorsal#2 abaxial#1,dorsal#2
00002312,0,0,dorsal#2 abaxial#1,abaxial#1
00002527,0,0,ventral#2 adaxial#1,ventral#2
00002527,0,0,ventral#2 adaxial#1,adaxial#1
00002730,0,0,acroscopic#1,acroscopic#1
00002843,0,0,basiscopic#1,basiscopic#1
00002956,0,0,abducting#1 abducent#1,abducting#1
00002956,0,0,abducting#1 abducent#1,abducent#1
00003131,0,0,adductive#1 adducting#1 adducent#1,adductive#1
00003131,0,0,adductive#1 adducting#1 adducent#1,adducting#1
00003131,0,0,adductive#1 adducting#1 adducent#1,adducent#1
4

2 回答 2

2

所以,这正在工作。经过长时间的努力。
注意:如果您没有在输入文件中使用正确的制表符。结果将是不正确的。不要忽略正确的标签。

  TextWriter tw = new StreamWriter("D:\\output.txt");    
  private void button1_Click(object sender, EventArgs e)
  {
        StreamReader reader = new StreamReader(@"C:\Users\Mohsin\Desktop\records.txt");
        string line;
        String lines = "";
        while ((line = reader.ReadLine()) != null)
        {

            String[] str = line.Split('\t');

            String[] words = str[3].Split(' ');
            for (int k = 0; k < words.Length; k++)
            {
                for (int i = 0; i < 4; i++)
                {
                    if (i + 1 != 4)
                    {
                        lines = lines + str[i] + "\t";
                    }
                    else
                    {
                        lines = lines + words[k] + "\r\n";

                    }
                }
            }
        }
        tw.Write(lines);
        tw.Close();
        reader.Close();
  }
于 2013-03-03T20:58:30.307 回答
1

我简化了您的代码并使其正常工作。它仍然缺乏验证,可以通过使用StringBuilder,特别是通过将每一行写入文件而不是将其附加到字符串来提高性能。它也缺乏exception handling.

using (TextWriter tw = File.CreateText(@"c:\temp\result.txt"))
using (StreamReader reader = new StreamReader(@"stackov1.txt"))
{
    string line;
    String lines = "";
    while ((line = reader.ReadLine()) != null)
    {

        String[] str = line.Split('\t');

        String[] words = str[3].Split(' ');
        for (int k = 0; k < words.Length; k++)
        {
            lines = lines + str[0] + "\t" + str[1] + "\t" + str[2] + "\t" + words[k] + "\r\n";
        }
    }
    tw.Write(lines);
}
于 2013-03-03T19:22:35.183 回答