python - 来自传递给 raw_input() 的文件内容副本的字符串中 CR 的奇怪消失

Question

为了弄清楚似乎是一个错误的原因，我终于遇到了Python 2.7中raw_input()函数的一个奇怪行为：

它仅从文件内容的手动复制（通过剪贴板）产生的字符串中删除对CR LF的CR字符。传递给raw_input()的字符串是与以前的字符串相同的字符串的显示副本，它们不会丢失它们的CR字符。在所有情况下，单独的CR字符都保持不变。CR（回车）是一个\ r字符。

为了比混乱的描述更清楚，这里有一段代码描述了必须做什么来观察事实，只需要执行其命令。

重点在于Text对象：它有 7 个字符，而不是传递给raw_input()以创建Text的 8 个字符。

为了验证传递给raw_input()的参数确实有 8 个字符，我用相同的参数创建了另一个文件PASTED.txt 。确定这个问题中的某些东西确实是一项尴尬的任务，因为在Notepad++窗口中的复制向我展示了：各种行尾（\r、\n、\r\n）在末端显示为CR LF在这样一个窗口中的行。

推荐使用 Ctrl-A 选择文件的全部数据。

我很困惑，想知道我是否在编码或理解上犯了错误，或者它是否是 Python 的真实特性。

我希望你的评论和光明。

with open('PRIM.txt','wb') as f:
    f.write('A\rB\nC\r\nD')
print "  1) A file with name 'PRIM.txt' has just been created with content A\\rB\\nC\\r\\nD"
raw_input("  Open this file and copy manually its CONTENT in the clipboard.\n"+\
          "    --when done, press Enter to continue-- ")


print "\n  2) Paste this CONTENT in a Notepad++ window "+\
      "     and see the symbols at the extremities of the lines."
raw_input("    --when done, press Enter to continue-- ")


Text = raw_input("\n  3) Paste this CONTENT here and press a key : ")
print ("     An object Text has just been created with this pasted value of CONTENT.")


with open('PASTED.txt','wb') as f:
    f.write('')
print "\n  4) An empty file 'PASTED.txt' has just been created."
print "     Paste manually in this file the PRIM's CONTENT and shut this file."
raw_input("     --when done, press Enter to continue-- ")


print "\n  5) Enter the copy of this display of A\\rB\\nC\\r\\nD : \nA\rB\nC\r\nD"
DSP = raw_input('please, enter it on the following line :\n')
print "    An object DSP has just been created with this pasted value of this copied display"


print '\n----------'
with open('PRIM.txt','rb') as fv:
    verif = fv.read()
print "The read content of the file 'PRIM.txt' obtained by open() and read() : "+repr(verif)
print "len of the read content of the file 'PRIM.txt'  ==",len(verif)


print '\n----------'
print "The file PASTED.txt received by pasting the manually copied CONTENT of PRIM.txt"
with open('PASTED.txt','rb') as f:
    cpd = f.read()
    print "The read content of the file 'PASTED.txt' obtained by open() and read() "+\
          "is now : "+repr(cpd)
    print "its len is==",len(cpd)


print '\n----------'
print 'The object Text received through raw_input() the manually copied CONTENT of PRIM.txt'
print "value of Text=="+repr(Text)+\
      "\nText.split('\\r\\n')==",Text.split('\r\n')
print 'len of Text==',len(Text)


print '\n----------'
print "The object DSP received  through raw_input() the copy of the display of A\\rB\\nC\\r\\nD" 
print "value of DSP==",repr(DSP)
print 'len of DSP==',len(DSP)

我的操作系统是 Windows。我想知道在其他操作系统上是否也观察到相同的情况。

score 2 · Accepted Answer

sys.stdin以文本模式打开（您可以通过显示sys.stdin.mode并查看它来检查它'r'）。如果您在 Python 中以文本模式打开任何文件，则平台本机换行符（\r\n适用于 Windows）将转换为 Python 字符串中的简单换行符 ( \n)。

您可以通过PASTED.txt使用模式打开文件'r'而不是'rb'.

score 0 · Accepted Answer

在我的帖子之后，我可以从我的代码中查找，我确实注意到从文件复制并传递给raw_input()的数据的修改与 Python 直接在文件中读取数据时执行的换行符的修改相同, 这里证明了这一点：

with open("TestWindows.txt", 'wb') as f:
    f.write("PACIFIC \r  ARCTIC \n  ATLANTIC \r\n  ")

print "\n- Following string have been written in TestWindows.txt in mode 'wb' :\n"+\
      "PACIFIC \\r  ARCTIC \\n  ATLANTIC \\r\\n  "


print "\n- data got by reading the file TestWindows.txt in 'rb' mode :"
with open("TestWindows.txt", 'rb') as f:
    print "    repr(data)==",repr(f.read())

print "\n- data got by reading the file TestWindows.txt in 'r' mode :"
with open("TestWindows.txt", 'r') as f:
    print "    repr(data)==",repr(f.read())

print "\n- data got by reading the file TestWindows.txt in 'rU' mode :"
with open("TestWindows.txt", 'rU') as f:
    print "    repr(data)==",repr(f.read())

结果：

- Following string have been written in TestWindows.txt in mode 'wb' :
PACIFIC \r  ARCTIC \n  ATLANTIC \r\n  

- data got by reading the file TestWindows.txt in 'rb' mode :
    repr(data)== 'PACIFIC \r  ARCTIC \n  ATLANTIC \r\n  '

- data got by reading the file TestWindows.txt in 'r' mode :
    repr(data)== 'PACIFIC \r  ARCTIC \n  ATLANTIC \n  '

- data got by reading the file TestWindows.txt in 'rU' mode :
    repr(data)== 'PACIFIC \n  ARCTIC \n  ATLANTIC \n  '

首先，文件 PASTED.txt 与文件PRIM.txt具有相同的内容，这是由于复制PRIM.txt的内容并将其粘贴到PASTED.txt中而没有在 Python 字符串中传输。因此，当数据从一个文件传输到另一个文件时，仅通过剪贴板传输，它不会被修改。这一事实证明PRIM.txt的内容在复制放置数据的剪贴板中完好无损。

其次，通过剪贴板和raw_input()从文件到 Python 字符串的数据被修改；因此修改发生在剪贴板和 Python 字符串之间。所以我认为raw_input()可能会对从剪贴板接收到的数据进行相同的解释，而不是 Python 解释器在从读取文件中接收数据时所做的解释。

然后，我绣了一个想法，将\r\n替换为\n是因为“Windows性质”的数据变成了“Python性质”的数据，并且剪贴板没有引入修改数据，因为它是受 Windows 操作系统控制的一部分。

唉，从屏幕复制并传递给raw_input()的数据不会经历换行符\r\n的转换，尽管这些数据通过 Windows 的剪贴板传输，这打破了我的小概念。

然后我认为Python知道数据的性质不是因为它的来源，而是因为数据中包含的信息；此类信息是一种“格式”。我找到了有关 Windows 剪贴板的以下页面，并且剪贴板记录的信息确实有几种格式：

http://msdn.microsoft.com/en-us/library/ms648709(v=vs.85).aspx

也许，Python 修改\r\n的解释与剪贴板中存在的这些格式有关，也许没有。但我对所有这些混乱的了解还不够，我很确定。

有人能够解释上述所有观察结果吗？

.

谢谢你的回答，ncoghlan。但我不认为这是原因：

sys.stdin没有属性模式
据我所知， sys.stdin指的是键盘。但是，在我的代码中，数据不是来自键盘上的输入，而是来自通过剪贴板粘贴。这不一样。

关键是我不明白Python解释器如何区分来自剪贴板的数据是从文件复制的，还是来自剪贴板的数据是从屏幕复制的

python - 来自传递给 raw_input() 的文件内容副本的字符串中 CR 的奇怪消失

2 回答 2

Related

Reference