5

我刚刚将我们的一个核心应用程序从 Windows+IIS+Coldfusion 移到了 Ubuntu+Apache+Lucee。第一个大问题是外来字母的 URI 编码。

例如,尝试访问此 urlhttp://www.example.com/ru/Солнцезащитные-очки/saint-laurent/会导致 Apache access.log 中出现此记录:

http://www.example.com/ru/%D0%A1%D0%BE%D0%BB%D0%BD%D1%86%D0%B5%D0%B7%D0%B0%D1%89%D0%B8%D1%82%D0%BD%D1%8B%D0%B5-%D0%BE%D1%87%D0%BA%D0%B8/saint-laurent/

好吧,我认为这是正确的 url 编码。然后我在 .htaccess 文件中使用重写规则在 url 查询字符串参数(比如说“foo”)中获取 url 的那部分(西里尔字母)。

使用 cflog 转储它,我在应用程序日志中看到:

/index.cfm?foo=оÑки-длÑ-зÑениÑ&

...这显然是错误的,因为我需要的是原始字符串,采用 utf-8 西里尔字母。

我试图将 URIEncoding 参数放在我的 server.xml tomcat http 连接器中,但没有结果:

<Connector port="8888" protocol="HTTP/1.1" 
               connectionTimeout="20000" 
               redirectPort="8443" 
                URIEncoding="UTF-8" />

如何在 UTF-8 中获取我的 url 参数?

4

2 回答 2

2

I found the solution by myself.

Source: http://blogs.warwick.ac.uk/kieranshaw/entry/utf-8_internationalisation_with

Apache

Generally you don't need to worry about Apache as it shouldn't be messing with your HMTL or URLs. However, if you are doing some proxying with mod_proxy then you might need to have a think about this. We use mod_proxy to do proxying from Apache through to Tomcat. If you've got encoded characters in URL that you need to convert into some query string for your underlying app then you're going to have a strange little problem.

If you have a URL coming into Apache that looks like this:

http://mydomain/%E4%B8%AD.doc and you have a mod_rewrite/proxy rule like this:

RewriteRule ^/(.*) http://mydomain:8080/filedownload/?filename=$1 [QSA,L,P]

Unfortunately the $1 is going to get mangled during the rewrite. QSA (QueryStringAppend) actually deals with these characters just fine and will send this through untouched, but when you grab a bit of the URL such as my $1 here then the characters get mangled as Apache tries to do some unescaping of its own into ISO-8859-1, but it's UTF-8 not ISO-8859-1 so it doesn't work properly. So, to keep our special characters in UTF-8, we'll escape it back again.

RewriteMap escape int:escape RewriteRule ^/(.*) http://mydomain:8080/filedownload/?filename=${escape:$1} [QSA,L,P]

Take a look at your rewrite logs to see if this is working.

Really hard to find.

于 2015-05-26T07:23:22.477 回答
1

最好不要在任何情况下在 URI 中使用西里尔字母。在其中包含除 ASCII 以外的内容是非常糟糕的做法。我从这里以俄语为母语的俄罗斯莫斯科告诉你。

有一种所谓的俄语音译(俄语罗马化),其中 33 个字母中的任何一个都可以直接转换为拉丁语。您可以应用这样的音译在后台将俄语解码为拉丁语,反之亦然。

像这样的东西:

hostname:8888/index.cfm?foo=Solntsezaschitnye-ochki

或者,如果可能的话,只使用 ID 号而不是文本。

于 2015-05-16T13:22:32.293 回答