python - Python：如何处理剪贴板中的粘贴文本？

Question

我正在处理这样的字符串：

    scrpt = "\tFrame\tX pixels\tY pixels\r\n\t2\t615.5\t334.5\r\n\t3\t615.885\t334.136\r\n\t4\t615.937\t334.087\r\n\t5\t615.917\t334.106\r\n\t6\t615.892\t334.129\r\n\t7\t615.905\t334.117\r\n\t8\t615.767\t334.246\r\n\t9\t615.546\t334.456\r\n\t10\t615.352\t334.643\r\n\r\n"

    infile = StringIO(scrpt)
    #pretend infile was just a regular file...

    r = csv.DictReader(infile, dialect=csv.Sniffer().sniff(infile.read(1000)))
    infile.seek(0)

    Frame, Xco, Yco = [],[],[]
    for row in r:
        Frame.append(row['Frame'])
        Xco.append(row['X pixels'])
        Yco.append(row['Y pixels'])

这工作正常。我将字符串变量“scrpt”很好地分类为变量“Frame”、“Xco”和“Yco”

现在，如果我这样做：

print(scrpt)

我看到东西整齐地排列在标签列中，如下所示：

Frame   X pixels    Y pixels

2   615.5   334.5
3   615.885 334.136
4   615.937 334.087
5   615.917 334.106
6   615.892 334.129
7   615.905 334.117
8   615.767 334.246
9   615.546 334.456
10  615.352 334.643

但是，如果我从剪贴板粘贴了相同的字符串并尝试处理它，它就不起作用。在这种情况下，如果我这样打印：

print(scrpt)

我懂了：

\tFrame\tX pixels\tY pixels\r\n\t2\t615.5\t334.5\r\n\t3\t615.885\t334.136\r\n\t4\t615.937\t334.087\r\n\t5\t615.917\t334.106\r\n\t6\t615.892\t334.129\r\n\t7\t615.905\t334.117\r\n\t8\t615.767\t334.246\r\n\t9\t615.546\t334.456\r\n\t10\t615.352\t334.643\r\n\r\n

然后，当我去处理它时， csv 模块不会对其进行排序。

我究竟做错了什么？看起来我在两种情况下都使用相同的数据，但有些不同。

score 0 · Accepted Answer

我的猜测是你的剪贴板有文字反斜杠和t字符，而不是制表符。例如，如果您只是从源代码的第一行复制，这正是您将得到的。

换句话说，就好像你这样做了：

scrpt = r"\tFrame\tX pixels\tY pixels\r\n\t2\t615.5\t334.5\r\n\t3\t615.885\t334.136\r\n\t4\t615.937\t334.087\r\n\t5\t615.917\t334.106\r\n\t6\t615.892\t334.129\r\n\t7\t615.905\t334.117\r\n\t8\t615.767\t334.246\r\n\t9\t615.546\t334.456\r\n\t10\t615.352\t334.643\r\n\r\n"

……或者，等效地：

scrpt = "\\tFrame\\tX pixels\\tY pixels\\r\\n\\t2\\t615.5\\t334.5\\r\\n\\t3\\t615.885\\t334.136\\r\\n\\t4\\t615.937\\t334.087\\r\\n\\t5\\t615.917\\t334.106\\r\\n\\t6\\t615.892\\t334.129\\r\\n\\t7\\t615.905\\t334.117\\r\\n\\t8\\t615.767\\t334.246\\r\\n\\t9\\t615.546\\t334.456\\r\\n\\t10\\t615.352\\t334.643\\r\\n\\r\\n"

如果这是问题所在，则修复非常简单：

scrpt = scrpt.decode('string_escape')

或者，在 3.x 中（你不能调用decodea str）：

script = codecs.decode(script, 'unicode_escape')

编解码器在模块中的标准编码unicode_escape列表中进行了描述。它定义为：codecs

在 Python 源代码中生成一个适合作为 Unicode 文字的字符串

In other words, if you encode with this codec, it will replace each non-printing Unicode character with an escape sequence that you can type into your source code. If you've got a tab character, it'll replace that with a backslash character and a t.

You want to do the exact reverse of that: you've got a string you copied out of source code, with source-code-style escape sequences, and you want to interpret it the same way the Python interpreter does. So, you just decode with the same codec. If you've got a backslash followed by a t, it'll replace them with a tab character.

It's worth playing with this in the interactive interpreter (remember to keep the repr and str representations straight while doing so!) until you get it.

python - Python：如何处理剪贴板中的粘贴文本？

1 回答 1

Related

Reference