python - 仅从 Python 中的文件中导入具有特定列数的行

Question

我正在尝试在 for 循环中将多个文件导入到我的代码中以进行分析，但这些文件的格式并不完全相同（并且手动编辑的文件太多）。

我需要的数据在每个文件中都是相同的 - 我作为字符串导入的 13 列。下面是一个文件示例：

could not open XWindow display
could not open XWindow display

No graphics display available for this session.
Graphics tasks that attempt to plot to an interactive screen will fail.

/data/poohbah/2/asassn/be/F0041-70_2645
###  JD        HJD            UT_date             IMAGE    FWHM  Diff Limit      mag    mag_err       counts   counts_err   flux(mJy)     flux_err
2456784.50841  2456784.50816  2014-05-07.0072681  interp_bf002339_coadd 2.61 -2.65 17.031      15.543  0.093          526.82        44.57   2.328        0.197       
2456789.45407  2456789.45347  2014-05-11.9529421  interp_be003585_coadd 2.26 -2.31 16.869      15.383  0.093          834.50        70.78   2.695        0.229       
2456790.47441  2456790.47419  2014-05-12.9732922  interp_bf004070_coadd 1.72 -2.25 17.246      15.721  0.090          645.67        52.82   1.974        0.162       
...
(data continues)
...
2457895.45745  2457895.45919  2017-05-21.9587133  interp_bf305499_coadd 1.71 -2.45 17.299      15.482  0.068          673.31        42.10   2.461        0.154       
/data/poohbah/1/assassin/bin/./ap_phot_im_cal_test.py:654: RuntimeWarning: invalid value encountered in sqrt
  counts_err_a = np.sqrt( counts_a / options.gain + (area_a * bg_stdev_a **2.0 ) )
/data/poohbah/1/assassin/bin/./ap_phot_im_cal_test.py:369: RuntimeWarning: invalid value encountered in less_equal
  no_detected = np.nonzero( (counts <= limit) & (area >= 0.01) )[0]
/data/poohbah/1/assassin/bin/./ap_phot_im_cal_test.py:367: RuntimeWarning: divide by zero encountered in log10
  maglimit[notbad] = -2.5 * np.log10(limit[notbad]) + def_zeropt

我只需要最后'###' 行和'/data' 路径之间的数据，并且在所有文件中，这部分的格式与13 列完全相同。但是，任何特定文件开头和结尾的“注释”可能会有所不同。有些没有“无法打开 XWindow 显示”，有些没有最后的路径。我尝试忽略以“#”或“/”开头的行，但这对于第一行或“counts_err_a”以及此特定示例末尾的此类行没有任何作用。

有没有办法将数据导入 Python 并且只获取其中包含特定列数的行？在伪代码中，它可能看起来像：

open(file_name)
 if column_number = 13
   np.genfromtxt(file_name)
 else skip

score 0 · Accepted Answer

在计算完它们之前，您不会知道有多少列，因此您可以在阅读文件时对其进行过滤，但您仍然需要split()一行。如下所示，例如，如果有很多评论，您可以添加其他检查。

saved_lines = []
with open(filename) as f:
    for line in f:
        if len(line.split()) == 13:
            saved_lines.append(line)

或等价于comprehension：

with open(filename) as f:
    saved lines = [line for line in f if len(line.split()) == 13]

python - 仅从 Python 中的文件中导入具有特定列数的行

1 回答 1

Related

Reference