python - 删除文件中的最后一个字符

Question

在浏览了整个互联网之后，我来到了这个。

假设我已经制作了一个文本文件，内容如下： Hello World

好吧，我想d从这个文本文件中删除最后一个字符（在本例中为）。

所以现在文本文件应该是这样的：Hello Worl

但我不知道该怎么做。

我想要的，或多或少，是我硬盘上的文本文件的单个退格功能。

这需要在 Linux 上工作，因为这就是我正在使用的。

score 75 · Accepted Answer

用于fileobject.seek()从末尾查找 1 个位置，然后用于file.truncate()删除文件的其余部分：

import os

with open(filename, 'rb+') as filehandle:
    filehandle.seek(-1, os.SEEK_END)
    filehandle.truncate()

这适用于单字节编码。如果您有一个多字节编码（例如 UTF-16 或 UTF-32），您需要从末尾寻找足够的字节来解释单个代码点。

对于可变字节编码，是否可以使用此技术取决于编解码器。对于 UTF-8，您需要找到bytevalue & 0xC0 != 0x80为真的第一个字节（从末尾开始），并从该点开始截断。这可确保您不会在多字节 UTF-8 代码点的中间截断：

with open(filename, 'rb+') as filehandle:
    # move to end, then scan forward until a non-continuation byte is found
    filehandle.seek(-1, os.SEEK_END)
    while filehandle.read(1) & 0xC0 == 0x80:
        # we just read 1 byte, which moved the file position forward,
        # skip back 2 bytes to move to the byte before the current.
        filehandle.seek(-2, os.SEEK_CUR)

    # last read byte is our truncation point, move back to it.
    filehandle.seek(-1, os.SEEK_CUR)
    filehandle.truncate()

请注意，UTF-8 是 ASCII 的超集，因此上述内容也适用于 ASCII 编码的文件。

score 11 · Accepted Answer

Martijn 接受的答案很简单，也很有效，但不考虑具有以下内容的文本文件：

包含非英文字符的UTF-8 编码（这是 Python 3 中文本文件的默认编码）
文件末尾的一个换行符（这是 Linux 编辑器中的默认值，如vimor gedit）

如果文本文件包含非英文字符，那么到目前为止提供的答案都不起作用。

下面是一个示例，它解决了这两个问题，它还允许从文件末尾删除多个字符：

import os


def truncate_utf8_chars(filename, count, ignore_newlines=True):
    """
    Truncates last `count` characters of a text file encoded in UTF-8.
    :param filename: The path to the text file to read
    :param count: Number of UTF-8 characters to remove from the end of the file
    :param ignore_newlines: Set to true, if the newline character at the end of the file should be ignored
    """
    with open(filename, 'rb+') as f:
        last_char = None

        size = os.fstat(f.fileno()).st_size

        offset = 1
        chars = 0
        while offset <= size:
            f.seek(-offset, os.SEEK_END)
            b = ord(f.read(1))

            if ignore_newlines:
                if b == 0x0D or b == 0x0A:
                    offset += 1
                    continue

            if b & 0b10000000 == 0 or b & 0b11000000 == 0b11000000:
                # This is the first byte of a UTF8 character
                chars += 1
                if chars == count:
                    # When `count` number of characters have been found, move current position back
                    # with one byte (to include the byte just checked) and truncate the file
                    f.seek(-1, os.SEEK_CUR)
                    f.truncate()
                    return
            offset += 1

这个怎么运作：

以二进制模式仅读取 UTF-8 编码文本文件的最后几个字节
向后迭代字节，寻找 UTF-8 字符的开头
找到字符（不同于换行符）后，将其作为文本文件中的最后一个字符返回

示例文本文件 - bg.txt：

Здравей свят

如何使用：

filename = 'bg.txt'
print('Before truncate:', open(filename).read())
truncate_utf8_chars(filename, 1)
print('After truncate:', open(filename).read())

输出：

Before truncate: Здравей свят
After truncate: Здравей свя

这适用于 UTF-8 和 ASCII 编码文件。

score 10 · Accepted Answer

如果您没有以二进制模式读取文件，而您只有“w”权限，我可以建议以下内容。

f.seek(f.tell() - 1, os.SEEK_SET)
f.write('')

在上面的代码中，f.seek()将只接受f.tell()您没有“b”访问权限的 b/c。然后您可以将光标设置到最后一个元素的开头。然后您可以通过空字符串删除最后一个元素。

score 6 · Accepted Answer

with open(urfile, 'rb+') as f:
    f.seek(0,2)                 # end of file
    size=f.tell()               # the size...
    f.truncate(size-1)          # truncate at that size - how ever many characters

确保在 Windows 上使用二进制模式，因为 Unix 文件行结尾很多返回非法或不正确的字符数。

score 3 · Accepted Answer

with open('file.txt', 'w') as f:
    f.seek(0, 2)              # seek to end of file; f.seek(0, os.SEEK_END) is legal
    f.seek(f.tell() - 2, 0)  # seek to the second last char of file; f.seek(f.tell()-2, os.SEEK_SET) is legal
    f.truncate()

根据文件的最后一个字符是什么，可以是换行符 (\n) 或其他任何字符。

score 0 · Accepted Answer

这是一种肮脏的方式（擦除和重新创建）......我不建议使用它，但是，可以这样做......

x = open("file").read()
os.remove("file")
open("file").write(x[:-1])

score 0 · Accepted Answer

在 Linux 系统或（Windows 下的 Cygwin）上。您可以使用标准的截断命令。您可以使用此命令减小或增加文件的大小。

为了将文件减少 1G，命令将是truncate -s 1G filename. 在下面的示例中，我减少了一个名为update.iso1G 的文件。

请注意，此操作不到五秒钟。

chris@SR-ENG-P18 /cygdrive/c/Projects
$ stat update.iso
  File: update.iso
  Size: 30802968576     Blocks: 30081024   IO Block: 65536  regular file
Device: ee6ddbceh/4000177102d   Inode: 19421773395035112  Links: 1
Access: (0664/-rw-rw-r--)  Uid: (1052727/   chris)   Gid: (1049089/Domain Users)
Access: 2020-06-12 07:39:00.572940600 -0400
Modify: 2020-06-12 07:39:00.572940600 -0400
Change: 2020-06-12 07:39:00.572940600 -0400
 Birth: 2020-06-11 13:31:21.170568000 -0400

chris@SR-ENG-P18 /cygdrive/c/Projects
$ truncate -s -1G update.iso

chris@SR-ENG-P18 /cygdrive/c/Projects
$ stat update.iso
  File: update.iso
  Size: 29729226752     Blocks: 29032448   IO Block: 65536  regular file
Device: ee6ddbceh/4000177102d   Inode: 19421773395035112  Links: 1
Access: (0664/-rw-rw-r--)  Uid: (1052727/   chris)   Gid: (1049089/Domain Users)
Access: 2020-06-12 07:42:38.335782800 -0400
Modify: 2020-06-12 07:42:38.335782800 -0400
Change: 2020-06-12 07:42:38.335782800 -0400
 Birth: 2020-06-11 13:31:21.170568000 -0400

该stat命令会告诉您有关文件的大量信息，包括其大小。

score 0 · Accepted Answer

This may not be optimal, but if the above approaches don't work out, you could do:

with open('myfile.txt', 'r') as file:
    data = file.read()[:-1]
with open('myfile.txt', 'w') as file:
    file.write(data)

The code first opens the file, and then copies its content (with the exception of the last character) to the string data. Afterwards, the file is truncated to zero length (i.e. emptied), and the content of data is saved to the file, with the same name. This is basically the same as vins ms's answer, except that it doesn't use the os package, and that is used the safer 'with open' syntax. This may not be recommended if the text file is huge. (I wrote this since none of the above approaches worked out too well for me in python 3.8).

python - 删除文件中的最后一个字符

8 回答 8

Related

Reference