python - 在 python (pyfits) 中打开/编辑 utf8 适合标头

Question

我必须处理一些在其标题中包含 utf8 文本的 fit 文件。这意味着基本上 pyfits 包的所有功能都不起作用。.decode也不起作用，因为 fit 标头是一个类而不是一个列表。有人知道如何解码标头以便我可以处理数据吗？实际内容并不那么重要，所以忽略字母就可以了。我当前的代码如下所示：

hdulist = fits.open('Jupiter.FIT')
hdu = hdulist[0].header
hdu.decode('ascii', errors='ignore')

我得到： AttributeError: 'Header' object has no attribute 'decode'

像这样的功能：

print (hdu)

返回：

ValueError: FITS header values must contain standard printable ASCII characters; "'Uni G\xf6ttingen, Institut f\xfcr Astrophysik'" contains characters/bytes that do not represent printable characters in ASCII.

我想过在条目中写一些东西，所以我不需要关心它。但是，我什至无法检索包含错误字符的条目，并且我想要一个批处理解决方案，因为我有数百个文件。

score 1 · Accepted Answer

正如 anatoly techtonik指出的那样，FITS 标头中的非 ASCII 字符完全无效，并使 FITS 文件无效。astropy.io.fits也就是说，如果至少可以读取无效条目，那就太好了。对此的支持目前已被破坏，需要英雄来修复它，但没有人拥有，因为它是一个不常见的问题，大多数人在一两个文件中遇到它，修复这些文件，然后继续。不过希望有人能解决这个问题。

同时，由于您确切知道该文件在打什么字符串，所以我只需以原始二进制模式打开该文件并替换该字符串。如果 FITS 文件非常大，您可以一次读取一个块并在这些块上进行替换。FITS 文件（尤其是标头）以 2880 字节块编写，因此您知道该字符串出现的任何位置都将与这样的块对齐，并且您无需对标头格式进行任何解析。只需确保您替换它的字符串不长于原始字符串，并且如果它更短，则在右边用空格填充，因为 FITS 标题是固定宽度格式，任何改变标题长度的东西都会损坏整个文件。对于这种特殊情况，我会尝试这样的事情：

bad_str = 'Uni Göttingen, Institut für Astrophysik'.encode('latin1')
good_str = 'Uni Gottingen, Institut fur Astrophysik'.encode('ascii')
# In this case I already know the replacement is the same length so I'm no worried about it
# A more general solution would require fixing the header parser to deal with non-ASCII bytes
# in some consistent manner; I'm also looking for the full string instead of the individual
# characters so that I don't corrupt binary data in the non-header blocks
in_filename = 'Jupiter.FIT'
out_filename = 'Jupiter-fixed.fits'

with open(in_filename, 'rb') as inf, open(out_filename, 'wb') as outf:
    while True:
        block = inf.read(2880)
        if not block:
            break
        block = block.replace(bad_str, good_str)
        outf.write(block)

这很难看，对于一个非常大的文件可能会很慢，但这是一个开始。我可以想到更好的解决方案，但如果您只有少量文件要修复，则更难理解并且可能不值得花时间。

完成后，请与文件的创建者进行严厉的交谈——他们不应该发布损坏的 FITS 文件。

score 0 · Accepted Answer

看起来PyFITS只是不支持它（还没有？）

来自https://github.com/astropy/astropy/issues/3497：

FITS 早于 unicode，并且从未更新以支持数据的 ASCII 可打印字符之外的任何内容。在 FITS 标头中编码非 ASCII 字符是不可能的。

python - 在 python (pyfits) 中打开/编辑 utf8 适合标头

2 回答 2

Related

Reference