在浏览了整个互联网之后,我来到了这个。
假设我已经制作了一个文本文件,内容如下:
Hello World
好吧,我想d
从这个文本文件中删除最后一个字符(在本例中为 )。
所以现在文本文件应该是这样的:Hello Worl
但我不知道该怎么做。
我想要的,或多或少,是我硬盘上的文本文件的单个退格功能。
这需要在 Linux 上工作,因为这就是我正在使用的。
用于fileobject.seek()
从末尾查找 1 个位置,然后用于file.truncate()
删除文件的其余部分:
import os
with open(filename, 'rb+') as filehandle:
filehandle.seek(-1, os.SEEK_END)
filehandle.truncate()
这适用于单字节编码。如果您有一个多字节编码(例如 UTF-16 或 UTF-32),您需要从末尾寻找足够的字节来解释单个代码点。
对于可变字节编码,是否可以使用此技术取决于编解码器。对于 UTF-8,您需要找到bytevalue & 0xC0 != 0x80
为真的第一个字节(从末尾开始),并从该点开始截断。这可确保您不会在多字节 UTF-8 代码点的中间截断:
with open(filename, 'rb+') as filehandle:
# move to end, then scan forward until a non-continuation byte is found
filehandle.seek(-1, os.SEEK_END)
while filehandle.read(1) & 0xC0 == 0x80:
# we just read 1 byte, which moved the file position forward,
# skip back 2 bytes to move to the byte before the current.
filehandle.seek(-2, os.SEEK_CUR)
# last read byte is our truncation point, move back to it.
filehandle.seek(-1, os.SEEK_CUR)
filehandle.truncate()
请注意,UTF-8 是 ASCII 的超集,因此上述内容也适用于 ASCII 编码的文件。
Martijn 接受的答案很简单,也很有效,但不考虑具有以下内容的文本文件:
vim
or gedit
)如果文本文件包含非英文字符,那么到目前为止提供的答案都不起作用。
下面是一个示例,它解决了这两个问题,它还允许从文件末尾删除多个字符:
import os
def truncate_utf8_chars(filename, count, ignore_newlines=True):
"""
Truncates last `count` characters of a text file encoded in UTF-8.
:param filename: The path to the text file to read
:param count: Number of UTF-8 characters to remove from the end of the file
:param ignore_newlines: Set to true, if the newline character at the end of the file should be ignored
"""
with open(filename, 'rb+') as f:
last_char = None
size = os.fstat(f.fileno()).st_size
offset = 1
chars = 0
while offset <= size:
f.seek(-offset, os.SEEK_END)
b = ord(f.read(1))
if ignore_newlines:
if b == 0x0D or b == 0x0A:
offset += 1
continue
if b & 0b10000000 == 0 or b & 0b11000000 == 0b11000000:
# This is the first byte of a UTF8 character
chars += 1
if chars == count:
# When `count` number of characters have been found, move current position back
# with one byte (to include the byte just checked) and truncate the file
f.seek(-1, os.SEEK_CUR)
f.truncate()
return
offset += 1
这个怎么运作:
示例文本文件 - bg.txt
:
Здравей свят
如何使用:
filename = 'bg.txt'
print('Before truncate:', open(filename).read())
truncate_utf8_chars(filename, 1)
print('After truncate:', open(filename).read())
输出:
Before truncate: Здравей свят
After truncate: Здравей свя
这适用于 UTF-8 和 ASCII 编码文件。
如果您没有以二进制模式读取文件,而您只有“w”权限,我可以建议以下内容。
f.seek(f.tell() - 1, os.SEEK_SET)
f.write('')
在上面的代码中,f.seek()
将只接受f.tell()
您没有“b”访问权限的 b/c。然后您可以将光标设置到最后一个元素的开头。然后您可以通过空字符串删除最后一个元素。
with open(urfile, 'rb+') as f:
f.seek(0,2) # end of file
size=f.tell() # the size...
f.truncate(size-1) # truncate at that size - how ever many characters
确保在 Windows 上使用二进制模式,因为 Unix 文件行结尾很多返回非法或不正确的字符数。
with open('file.txt', 'w') as f:
f.seek(0, 2) # seek to end of file; f.seek(0, os.SEEK_END) is legal
f.seek(f.tell() - 2, 0) # seek to the second last char of file; f.seek(f.tell()-2, os.SEEK_SET) is legal
f.truncate()
根据文件的最后一个字符是什么,可以是换行符 (\n) 或其他任何字符。
这是一种肮脏的方式(擦除和重新创建)......我不建议使用它,但是,可以这样做......
x = open("file").read()
os.remove("file")
open("file").write(x[:-1])
在 Linux 系统或(Windows 下的 Cygwin)上。您可以使用标准的截断命令。您可以使用此命令减小或增加文件的大小。
为了将文件减少 1G,命令将是truncate -s 1G filename
. 在下面的示例中,我减少了一个名为update.iso
1G 的文件。
请注意,此操作不到五秒钟。
chris@SR-ENG-P18 /cygdrive/c/Projects
$ stat update.iso
File: update.iso
Size: 30802968576 Blocks: 30081024 IO Block: 65536 regular file
Device: ee6ddbceh/4000177102d Inode: 19421773395035112 Links: 1
Access: (0664/-rw-rw-r--) Uid: (1052727/ chris) Gid: (1049089/Domain Users)
Access: 2020-06-12 07:39:00.572940600 -0400
Modify: 2020-06-12 07:39:00.572940600 -0400
Change: 2020-06-12 07:39:00.572940600 -0400
Birth: 2020-06-11 13:31:21.170568000 -0400
chris@SR-ENG-P18 /cygdrive/c/Projects
$ truncate -s -1G update.iso
chris@SR-ENG-P18 /cygdrive/c/Projects
$ stat update.iso
File: update.iso
Size: 29729226752 Blocks: 29032448 IO Block: 65536 regular file
Device: ee6ddbceh/4000177102d Inode: 19421773395035112 Links: 1
Access: (0664/-rw-rw-r--) Uid: (1052727/ chris) Gid: (1049089/Domain Users)
Access: 2020-06-12 07:42:38.335782800 -0400
Modify: 2020-06-12 07:42:38.335782800 -0400
Change: 2020-06-12 07:42:38.335782800 -0400
Birth: 2020-06-11 13:31:21.170568000 -0400
该stat
命令会告诉您有关文件的大量信息,包括其大小。
This may not be optimal, but if the above approaches don't work out, you could do:
with open('myfile.txt', 'r') as file:
data = file.read()[:-1]
with open('myfile.txt', 'w') as file:
file.write(data)
The code first opens the file, and then copies its content (with the exception of the last character) to the string data
. Afterwards, the file is truncated to zero length (i.e. emptied), and the content of data
is saved to the file, with the same name.
This is basically the same as vins ms's answer, except that it doesn't use the os package, and that is used the safer 'with open' syntax. This may not be recommended if the text file is huge. (I wrote this since none of the above approaches worked out too well for me in python 3.8).