-2

这是我的代码:

# -*- coding: utf-8-*-
array=["à","á","â","ã","ä","å","æ","ç","è","é","ê","ë","ì","í","î","ï","ð","ñ","ó","ô","õ","ö","ø","ù","ú","û","ü","ý","þ","ÿ"]
array1=["א","ב","ג","ד","ה","ו","ז","ח","ט","י","ך","כ","ל","ם","מ","ן","נ","ס","ע","ף","פ","ץ","צ","ק","ר","ש","ת"]
str="áï éäåãä"
message=""
for i in range(0,len(str)):
   s=str[i]
   index=-1
   for j in range(0,len(array)):
       if(array[j]==s):
           index=j
           break
   if(index!=-1):
   message+=array1[index]
   print array1[index]
print message

错误是:

SyntaxError: EOL while scanning string literal

在第 2 行

我有一个希伯来语文本文件,但无论编码是什么,它总是以乱码显示。这是一个将其转换为希伯来语的 python 程序。原始文件在 IS0-8859-1

4

2 回答 2

4

As @Martijn suggests, decoding your original file correctly would be a better solution. If your file is Hebrew but displays array characters, it is probably being displayed as latin1 or cp1252 encoding. cp1255 looks like a close match. Perhaps your array1 isn't quite right. Also note strings are iterable so you can simplify your arrays:

# coding: utf8
array  = u'àáâãäåæçèéêëìíîïðñóôõöøùúûüýþÿ'
array1 = u'אבגדהוזחטיךכלםמןנסעףפץצקרשת'
print(array)
print(array1)
print(array.encode('cp1252').decode('cp1255',errors='replace'))

The last line above reverses the "incorrect" encoding and decodes it with cp1255 (a Hebrew encoding) instead. Output:

àáâãäåæçèéêëìíîïðñóôõöøùúûüýþÿ
אבגדהוזחטיךכלםמןנסעףפץצקרשת
אבגדהוזחטיךכלםמןנסףפץצרשת��‎‏�

It's not a perfect match, but close enough that I think your original file was encoded with cp1255.

于 2013-08-18T02:36:04.683 回答
4

你使用了'你应该使用的地方"

'ÿ"

对于最后一个条目:

array=["à","á","â","ã","ä","å","æ","ç","è","é","ê","ë","ì","í","î","ï","ð","ñ","ó","ô","õ","ö","ø","ù","ú","û","ü","ý","þ",'ÿ"]

使该单引号成为双引号。

至于你的翻译程序;听起来好像您的文件编码不正确,或者解码不正确。也许您应该找出正确的编码,而不是盲目地将拉丁语 1 字节替换为希伯来语代码点的 UTF-8 序列?

如果您要使用该codec模块以正确的编解码器打开文件并解码为 Unicode,您很可能会发现数据已正确编码。

在继续之前,我强烈建议您学习 Unicode、编解码器和 Python:

于 2013-08-17T22:04:44.077 回答