0

我想计算 txt 文件中每一行的缺失发生率。

foo.txt文件:

1 1 1 1 1 NA    # so, Missings: 1
1 1 1 NA 1 1    # so, Missings: 1
1 1 NA 1 1 NA   # so, Missings: 2  

但我也想获得第一行的元素数量(假设所有行都相等)。

miss = []
with open("foo.txt") as f:
    for line in f:
        miss.append(line.count("NA"))

>>> miss
[1, 1, 2]         # correct

问题是当我尝试确定元素的数量时。我用以下代码做到了这一点:

miss = []
with open("foo.txt") as f:
    first_line = f.readline()
    elements = first_line.count(" ")  # given that values are separated by space
    for line in f:
        miss.append(line.count("NA"))

>>> (elements + 1)
6   # True, this is correct          
>>> miss 
[1,2]  # misses the first item due to readline() removing lines.`

如何在不删除第一行以进行进一步操作的情况下读取第一行?

4

3 回答 3

2

试试f.seek(0)。这会将文件句柄重置为文件的开头。

完整的示例将是:

miss = []
with open("foo.txt") as f:
    first_line = f.readline()
    elements = first_line.count(" ")  # given that values are separated by space
    f.seek(0)
    for line in f:
        miss.append(line.count("NA"))

更好的是读取所有行,甚至是第一行,只读取一次,并且只检查一次元素的数量:

miss = []
elements = None
with open("foo.txt") as f:
    for line in f:
        if elements is None:
            elements = line.count(" ")  # given that values are separated by space
        miss.append(line.count("NA"))

顺便说一句:元素的数量不是line.count(" ") + 1吗?

我建议使用len(line.split()),因为它还可以处理制表符、双空格、前导/尾随空格等。

于 2013-06-03T08:49:51.140 回答
2

如果所有行都有项目数,您可以只计算最后一行中的项目数:

miss = []
with open("foo.txt") as f:
    for line in f:
        miss.append(line.count("NA")
    elements = len(line.split())

一种更好的计数方法可能是:

elements = len(line.split())  

因为这也计算用多个空格或制表符分隔的项目。

于 2013-06-03T08:59:48.010 回答
0

您也可以单独处理第一行

with open("foo.txt") as f:
    first_line = next(f1)
    elements = first_line.count(" ")  # given that values are separated by space
    miss = [first_line.count("NA")]
    for line in f:
        miss.append(line.count("NA")
于 2013-06-03T08:55:42.957 回答