1

我正在处理这样的字符串:

    scrpt = "\tFrame\tX pixels\tY pixels\r\n\t2\t615.5\t334.5\r\n\t3\t615.885\t334.136\r\n\t4\t615.937\t334.087\r\n\t5\t615.917\t334.106\r\n\t6\t615.892\t334.129\r\n\t7\t615.905\t334.117\r\n\t8\t615.767\t334.246\r\n\t9\t615.546\t334.456\r\n\t10\t615.352\t334.643\r\n\r\n"

    infile = StringIO(scrpt)
    #pretend infile was just a regular file...

    r = csv.DictReader(infile, dialect=csv.Sniffer().sniff(infile.read(1000)))
    infile.seek(0)

    Frame, Xco, Yco = [],[],[]
    for row in r:
        Frame.append(row['Frame'])
        Xco.append(row['X pixels'])
        Yco.append(row['Y pixels'])

这工作正常。我将字符串变量“scrpt”很好地分类为变量“Frame”、“Xco”和“Yco”

现在,如果我这样做:

print(scrpt)

我看到东西整齐地排列在标签列中,如下所示:

Frame   X pixels    Y pixels

2   615.5   334.5
3   615.885 334.136
4   615.937 334.087
5   615.917 334.106
6   615.892 334.129
7   615.905 334.117
8   615.767 334.246
9   615.546 334.456
10  615.352 334.643

但是,如果我从剪贴板粘贴了相同的字符串并尝试处理它,它就不起作用。在这种情况下,如果我这样打印:

print(scrpt)

我懂了:

\tFrame\tX pixels\tY pixels\r\n\t2\t615.5\t334.5\r\n\t3\t615.885\t334.136\r\n\t4\t615.937\t334.087\r\n\t5\t615.917\t334.106\r\n\t6\t615.892\t334.129\r\n\t7\t615.905\t334.117\r\n\t8\t615.767\t334.246\r\n\t9\t615.546\t334.456\r\n\t10\t615.352\t334.643\r\n\r\n

然后,当我去处理它时, csv 模块不会对其进行排序。

我究竟做错了什么?看起来我在两种情况下都使用相同的数据,但有些不同。

4

1 回答 1

0

我的猜测是你的剪贴板有文字反斜杠和t字符,而不是制表符。例如,如果您只是从源代码的第一行复制,这正是您将得到的。

换句话说,就好像你这样做了:

scrpt = r"\tFrame\tX pixels\tY pixels\r\n\t2\t615.5\t334.5\r\n\t3\t615.885\t334.136\r\n\t4\t615.937\t334.087\r\n\t5\t615.917\t334.106\r\n\t6\t615.892\t334.129\r\n\t7\t615.905\t334.117\r\n\t8\t615.767\t334.246\r\n\t9\t615.546\t334.456\r\n\t10\t615.352\t334.643\r\n\r\n"

……或者,等效地:

scrpt = "\\tFrame\\tX pixels\\tY pixels\\r\\n\\t2\\t615.5\\t334.5\\r\\n\\t3\\t615.885\\t334.136\\r\\n\\t4\\t615.937\\t334.087\\r\\n\\t5\\t615.917\\t334.106\\r\\n\\t6\\t615.892\\t334.129\\r\\n\\t7\\t615.905\\t334.117\\r\\n\\t8\\t615.767\\t334.246\\r\\n\\t9\\t615.546\\t334.456\\r\\n\\t10\\t615.352\\t334.643\\r\\n\\r\\n"

如果这是问题所在,则修复非常简单:

scrpt = scrpt.decode('string_escape')

或者,在 3.x 中(你不能调用decodea str):

script = codecs.decode(script, 'unicode_escape')

编解码器在模块中的标准编码unicode_escape列表中进行了描述。它定义为:codecs

在 Python 源代码中生成一个适合作为 Unicode 文字的字符串

In other words, if you encode with this codec, it will replace each non-printing Unicode character with an escape sequence that you can type into your source code. If you've got a tab character, it'll replace that with a backslash character and a t.

You want to do the exact reverse of that: you've got a string you copied out of source code, with source-code-style escape sequences, and you want to interpret it the same way the Python interpreter does. So, you just decode with the same codec. If you've got a backslash followed by a t, it'll replace them with a tab character.

It's worth playing with this in the interactive interpreter (remember to keep the repr and str representations straight while doing so!) until you get it.

于 2013-03-29T22:40:50.960 回答