1

我在逐行读取文件(textasset)并获得结果时遇到问题!

这是我要阅读的文件:

AUTHOR
COMMENT 
INFO      1  X   ARG  0001       0.581   2.180   1.470
INFO      2  X   ARG  0001       1.400   0.974   1.724
INFO      3  X   ARG  0001       2.553   0.934   0.751
INFO      4  X   ARG  0001       3.650   0.494   1.053
INFO      5  X   ARG  0001       1.188   3.073   1.532
INFO      6  X   ARG  0001       2.312   1.415  -0.466
INFO      7  X   ARG  0001      -0.232   2.249   2.180
END

这是我正在使用的代码:

//read file
string[] line = file.text.Split("\n"[0]);

for(int i = 0 ; i < line.Length ; i++)
{
    if(line[i].Contains("INFO"))
    {
        //To replace all spaces with single underscore "_" (it works fine)
        string l = Regex.Replace(line[i]," {2,}","_");

       //In this Debug.Log i get correct results
       //For example "INFO_1_X_ARG_0001_0.581_2.180_1.470"
       Debug.Log(l);
       string[] subline =  Regex.Split(l,"_");
       //Only for first "INFO" line i get correct results (INFO,1,X,ARG,0001,0.581,2.180,1.470)
       //For all other "INFO" lines i get incomplete results (first,fourth and fifth element are not put into substrings
       //like they are dissapeard!
       foreach(string s in subline){Debug.Log(s);}
    }
}

解释:

我首先将文本分成几行(工作正常),然后我只读取包含的行INFO

我循环所有包含下划线的行INFO并用下划线替换所有空格_(这很好用)

INFO我根据下划线将包含的行拆分为子字符串_

当我打印出这些行时,只有第一行INFO似乎有所有子字符串,每下一行都没有正确拆分(第一部分INFO和第三个字符串都被省略了)

这似乎非常不可靠。这是处理这些事情的方法吗?任何帮助表示赞赏!这应该很简单,我做错了什么?

编辑:

这段代码有问题(它应该很简单,但它不起作用)

这是更新的代码(我刚刚制作了一个List<string> list = new List<string>()并复制了所有子字符串。我使用 unity3D 以便列表内容显示在检查器中。当我所有正确提取的子字符串但很简单时,我感到震惊

foreach(string s in list)
 Debug.Log(s);

确实缺少一些价值。所以我尝试了不同的东西和这段代码:

for(int x = 0; x < list.Count ; x++)
{
  Debug.Log("List: " + x.ToString() + " " + list[x].ToString());
}

正确显示列表的内容,但此代码(请注意,我刚刚删除x.ToString())缺少列表中的一些元素。它不想阅读它们!

for(int x = 0; x < list.Count ; x++)
  Debug.Log("List: " + list[x].ToString());

所以我不确定这里发生了什么?!

4

4 回答 4

0

You may want to try something like this:

for (int i = 0; i < line.Length; i++)
{
    if (line[i].Contains("INFO"))
    {
        string l = Regex.Replace(line[i], @"\p{Zs}{2,}|\t+", "_");

        string[] sublines = l.Split('_');

        // If you want to see the debug....
        sublines.ForEach(s => Debug.Log(s));
    }
}

The \p{Zs} will match all Unicode separator/space characters (e.g. space, non-breaking spaces, etc.). The following reference may be of some help to you: Character Classes in Regular Expressions.

于 2013-06-10T16:29:05.730 回答
0

The following seems to be working for me:

using (var fs = new FileStream(filePath, FileMode.Open))
using (var reader = new StreamReader(fs))
{
    string line;
    while ((line = reader.ReadLine()) != null)
    {
        if (line.StartsWith("INFO"))
        {
            line = Regex.Replace(line, "[ ]+", "_");
            var subline = line.Split('_');

            foreach (var str in subline)
            {
                Console.Write("{0} ",str);
            }
            Console.WriteLine();
        }

    }

}
于 2013-06-10T18:10:51.127 回答
0

尝试string.split("\t"[0]")您在列之间可能有制表符。

于 2013-05-04T17:37:49.373 回答
0

有一些问题

1>您使用的 contains 方法区分大小写,即 INFO != info

你应该使用

line[i].ToLower().Contains("info")

2>文本总是用空格分隔吗?也可以用制表符分隔。你最好用

Regex.Replace(line[i]," {2,}|\t+","_");
//this would replace 1 to many tabs or 2 or more space
于 2013-05-04T17:46:42.203 回答