0

我正在尝试将 document.sdf (json) 发送到 Amazon Cloud Search。一切正常,直到有一些特殊字符

Found Unicode characters that are not legal for Cloud Search:\n Illegal Unicode character '\u0002'\n Illegal Unicode character '\u0010'\n Illegal Unicode character '\u0001'\n Illegal Unicode character '\b'

错误来自这段文字:

...sadad<br \/>\n;color:G\u0002% k\u0010>\u0001\b? X_? p>", ...

这些来自由 PHP 脚本生成的 document.sdf 和json_encoded

以上原文:

;颜色:G%k>? X_?p>

4

1 回答 1

1

使用如下正则表达式从文本中删除所有无效字符可能是值得的:

[^\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF]

但是当我遇到类似的问题时,问题只是我在进行 POST 时没有明确指定字符编码,例如:

$curl = curl_init($cloudsearch_url);
curl_setopt($curl, CURLOPT_HTTPHEADER, 
            array('Content-Type: application/json; charset=UTF-8')); //Defaults to ISO10646 (I think) without this
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_POSTFIELDS, $post_data);
curl_exec($curl);
于 2013-07-25T00:31:43.717 回答