0

我有一个包含字符串数据 (UTF-8)、二进制 (true/false/1/0) 和整数数据的元组,我想在输出文件中作为一行输出。我的部分代码是:

###  Python 2.73

import fileinput
import re
import time
import codecs

uIDfile = '\Python\Fav Test\ppl.ttxt'

InFile = open(uIDfile)
OutFile = codecs.open('C:\Python\Fav Test\S2.ttxt', encoding='utf-8', mode='w')

for user in InFile:
    user = user [:-1]
#   user = unicode(user, 'utf-8').encode('utf-8')

    if 'NNNN' in user:
        break
    else:
        if '@N' in user:
            try:
                Grp = people_getGroups(user_id = user)
                g = 0
                if GetAll:
                    for group in Grp.find('groups').findall('group'):

                        g += 1
                        fErr = ''
                        uID  = user
                        gID  = group.get('ID')
                        gName  =  group.get('name')
                        tup = '\"{0}\"\t\"{2}\"\t\"{1}\"\t''\t{3}\t{4}\t{5}\t{6}\n'.format(uNSID, gNSID, gName, bin1, bin2, int1, int2)
                        OutFile.write(tup.encode('utf-8'))

我尝试了几种不同版本的“OutFile.write()”语句。下面列出了每个错误。

OutFile.write(codecs.utf_8_decode(tup.encode('utf-8')))
    TypeError: coercing to Unicode: need string or buffer, tuple found

OutFile.write('\t'.join(codecs.utf_8_decode(tup.encode('utf-8'))))
    TypeError: sequence item 1: expected string or Unicode, int found

OutFile.write('\t'.join(map(str, codecs.utf_8_decode(tup.encode('utf-8')))))
    tup = '\"{0}\"\t\"{2}\"\t\"{1}\"\t""\t\"{3}\"\t\"{4}\"\t\"{5}\"\t\"{6}\"\n'.format(uNSID, gNSID, gName, str(bin1), str(bin2), str(int1), str(int2))
    UnicodeEncodeError: "'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)"

OutFile.write('\t'.join(map(str, codecs.utf_8_decode(tup.encode('utf-8')))))
    tup = '\"{0}\"\t\"{2}\"\t\"{1}\"\t""\t\"{3}\"\t\"{4}\"\t\"{5}\"\t\"{6}\"\n'.format(uNSID, gNSID, gName, bin1, bin2, int1, int2)
    UnicodeEncodeError: "'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)"

真诚感谢任何帮助!

4

1 回答 1

1

如果你想在文件中输出行,我建议你使用csv 模块。这是如何使用它的示例:

#-*- coding: utf-8 -*-
import csv
# Use of tempfile instead of hard-coded path, to be cross-platform :)
import tempfile
_, tmppath = tempfile.mkstemp()
out = open(tmppath, 'w')
writer = csv.writer(out)
input = "Te×t Ðåtå".decode('utf-8')
tup = (input.encode('utf-8'), 42, False)
tup
# OUT: ('Te\xc3\x97t \xc3\x90\xc3\xa5t\xc3\xa5', 42, False)
writer.writerow(tup)
out.close()
print(u"Look at me : {}".format(tmppath))

您可以使用方言和格式参数来精确定义输出文件的格式。

为避免 UTF8 干扰,这些好幻灯片中解释的良好做法是:

  • 尽早解码
  • Unicode 无处不在
  • 编码晚
于 2013-02-27T10:28:22.493 回答