0

我想插入一些从谷歌翻译获得的数据。例如: http ://translate.google.com/translate_a/t?client=t&hl=en&sl=auto&tl=fa&multires=1&prev=btn&ssel=0&tsel=3&uptl=fa&alttl=en&sc=1&text=hello

收到结果后,我想将其插入到 MySQL 表中。所以我写了以下代码:

$link     = "http://translate.google.com/translate_a/t?client=t&hl=en&sl=auto&tl=fa&multires=1&prev=btn&ssel=0&tsel=3&uptl=fa&alttl=en&sc=1&text=";

$server   = "127.0.0.1";
$username = "AliAhmadi";
$password = "AliAhmadi";
$database = "AliAhmadi";

$conn     = mysql_pconnect($server, $username, $password);
if (!$conn)
     die("Bye Bye");
mysql_select_db($database, $conn);
mysql_set_charset('utf8',$conn);
$ch       = curl_init();
$url          = $link."hello";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$WebContent   = curl_exec($ch);
$update_query = 'update `en_db` SET `meaning`="'.mysql_real_escape_string($WebContent).'" where `id`=1';
mysql_query($update_query,$conn);
mysql_close($conn);

Google 发送了以下文本文件:

[[["سلام", "你好", "", ""]], [["感叹词", ["سلام", "هالو", "الو"], [["سلام", ["你好", “嗨”、“阿罗哈”、“万岁”]]、[“هالو”、[“你好”、“你好”、“你好”]]、[“الو”、[“你好”]]]]]、 "en", , [["سلام", [5], 0, 0, 1000, 0, 1, 0]], [["你好", 4, , , ""], ["你好", 5, [["سلام", 1000, 0, 0], ["خوش", 0, 0, 0], ["میهمان گرامی", 0, 0, 0], ["خوش آمدید", 0, 0, 0] , ["درود کاربر", 0, 0, 0]], [[0, 5]], "hello"]], , , [["en"]], 74]

但在表中只保存了字符串的第一部分:

[[[”

我认为问题来自 unicode,因为当我发表评论时mysql_set_charset('utf8',$conn);,它会在表格中保存一些东西,但看起来像

[[["Èå","to","",""]],[["介词",["Èå","ÈÑÇ\u06CC","ÏÑ","ÏÑ ÈÑÇÈÑ","\u06CCÔ", "Óæ\u06CC","äÒÏ","ØÑÝ","ÈÓæ\u06CC","ÊÇ äÓÈÊ Èå","ÈÑ ÍÓÈ","ÈØÑÝ","ÑæÈØÑÝ"],[["Èå",["to" ,"into","in","on","at","against"]],["ÈÑÇ\u06CC",["for","to","on","为了","向","为了"]],["ÏÑ",["at","to","about","unto"]],["ÏÑ ÈÑÇÈÑ",["反对","对", "to","for","unto"]],["\u06CCÔ",["before","to","with","unto"]],["Óæ\u06CC",["to","unto"]],["äÒÏ",["to","near","about"]],["ØÑÝ ",["towards","to"]],["ÈÓæ\u06CC",["toward","to","into","off","unto","at"]],["ÊÇ äÓÈÊ Èå",["to","unto"]],["ÈÑ ÍÓÈ",["根据","in","at","to"]],["ÈØÑÝ",["toward" ,"at","unto","to","in","into"]],["ÑæÈØÑÝ",["unto","to"]]]],["",["ÚáÇãÊ ãÕÏÑ Çäá \u06CCÓ\u06CC ÇÓÊ"],[["ÚáÇãÊ ÕÏÑ Çäá\u06CCÓ\u06CC ÇÓÊ",["to"]]]]],"en",,[["Èå",[5],0,0, 1000,0,1,0]],[["to",4,,,""],["to",5,[["Èå",1000,0,0],["ÈÑÇ\u06CC",0,0,0],["ÊÇ", 0,0,0],["ÑÇ Èå",0,0,0],["Èå ãäÙæÑ",0,0,0]],[[0,2]],"to"]],,, ,5]

谷歌翻译返回的 unicode 是什么?我对这段代码的问题在哪里?我更改了 utf8_unicode_ci、utf8_general_ci 和 utf8_presian_ci 之间的排序规则,但这个问题又发生了。

4

2 回答 2

2

我相信您的en_db.meaning列被定义为默认排序规则latin1_swedish_ci。这使用 ISO-8859-1 (Latin-1) 编码,它不能存储阿拉伯字符。

(当您删除mysql_set_charset调用时,MySQL 会将您的 UTF-8 阿拉伯语误解为拉丁字符,这些字符确实适合该列,但看起来完全错误。)

确保在创建表时指定使用 UTF-8 的排序规则,例如CREATE TABLE en_db (...) COLLATE utf8_general_ci或一般情况下(...) CHARACTER SET utf8(或utf8mb4用于星体平面支持,如果可用)。

您可以使用 更改现有表及其中所有文本列的排序规则ALTER TABLE en_db CONVERT TO CHARACTER SET utf8,但如果您已经在其中包含非 ASCII 字符,则它们都可能是错误的。

于 2012-05-26T08:13:59.710 回答
-3
<?php
//Set Beginning of php code:
header("Content-Type: text/html; charset=UTF-8");
mysql_query("SET NAMES 'utf8'"); 
mysql_query('SET CHARACTER SET utf8');

//then create the connection 
$CNN=mysql_connect("localhost","usr_urdu","123") or die('Unable to Connect');
$DB=mysql_select_db('db_urdu',$CNN)or die('Unable to select DB');
于 2014-01-31T17:52:04.633 回答