python - 在 Python 中检查空数据文件（仅标头）

Question

numpy.loadtxt在使用或numpy.genfromtxt将数据列加载到 numpy 数组之前检查数据文件是否只有标题的最优雅（和/或 Pythonic）方法是什么？

我有一个量子蒙特卡罗代码，它在执行时将标头写入磁盘，有时从不写入数据（正在使用集群的挂钟）。我显然已经编写了 Python 脚本来一次处理大量数据文件，有时其中一些文件最终在分配的时间内从未有数据写入它们。在我尝试加载数据并对其执行某些操作之前，我需要让我的分析脚本知道文件何时为空。

我的方法（有效，但可能不是最优雅的）是调用一个看起来像

def checkIfEmpty(fName,n):
    '''
    takes the first non-header line number and returns true or false
    depending upon whether that line is blank or not.
    '''
    Empty = False
    fp = open(fName)
    numLines=0
    for line in fp:
        numLines += 1
    fp.close()

    if n==numLines:
        Empty=True

    return Empty

score 2 · Accepted Answer

EDIT: Since you've indicated the output files may not really be that much bigger than the header-only files, I've thought of a different way to rid yourself of the explicit for loop.

def checkIfEmpty(fname, n):
    # NOTE: n is the file byte position at the end of the header.
    file_open = open( fname, 'r' )
    EOH = file_open.seek(n)
    if len(file_open.read()) == 0:
        return False
    else:
        return True

Wherever you calculate n in your code currently, you would just return the byte position. open_file.tell() will return this value, if you've read in lines somewhere else to test your header.

END EDIT

How much data is usually in the file?

If there's a huge difference in the file size if the data is missing you could use:

import os
def checkIfEmpty(fname, header_cutoff):
    if os.path.getsize( fname ) < header_cutoff:
        return True
    else:
        return False

Another reason I would prefer this solution is that with alot of large files, opening and checking them could be slow.

score 1 · Accepted Answer

就像是：

def is_header_only(fname):
    with open(fname) as fin:
        return next(fin, '').lstrip().startswith('#') and next(fin, None) is None

python - 在 Python 中检查空数据文件（仅标头）

2 回答 2

Related

Reference