2

我有一个 Ruby on Rails 应用程序,它是 Java 和 .Net 应用程序的服务器。我有一个自定义标头用于发送一些数据,但是当这些数据到达 Ruby on Rails 应用程序时,Rails 将值读取为 UTF-8,然后说该值不是有效的 UTF-8 字符串。

例如,如果我发送JÜRGENELITE-HP 我会得到:

#<ActiveRecord::StatementInvalid: PGError: ERROR:  invalid byte sequence for encoding "UTF8": 0xdc52
: SELECT * FROM "replicas" WHERE ("replicas"."identification" = 'J?RGENELITE-HP') AND ("replicas".user_id = 121)  LIMIT 1>

Java HTTP Client 库清楚地在控制台中正确打印数据:

DEBUG [main] (DefaultClientConnection.java:268) - >> POST /ze/api/files.json HTTP/1.1
DEBUG [main] (DefaultClientConnection.java:271) - >> X-Replica: JÜRGENELITE-HP
DEBUG [main] (DefaultClientConnection.java:271) - >> Authorization: Basic bWxpbmhhcmVzOjEyMzQ1Njc4

DEBUG [main] (DefaultClientConnection.java:271) - >> Content-Length: 0
DEBUG [main] (DefaultClientConnection.java:271) - >> Host: localhost:3000
DEBUG [main] (DefaultClientConnection.java:271) - >> Connection: Keep-Alive
DEBUG [main] (DefaultClientConnection.java:271) - >> User-Agent: Apache-HttpClient/4.1.2 (java 1.5)

但是当它到达 Rails 时它会中断。HTTP 使用什么编码来编码标头值?

4

1 回答 1

2

美国ASCII

如果您查看RFC2616的第 2.2 节:

2.2 基本规则

在本规范中使用以下规则来
描述基本的解析结构。US-ASCII 编码字符集
由 ANSI X3.4-1986 [21] 定义。

   OCTET          = <any 8-bit sequence of data>
   CHAR           = <any US-ASCII character (octets 0 - 127)>
   UPALPHA        = <any US-ASCII uppercase letter "A".."Z">
   LOALPHA        = <any US-ASCII lowercase letter "a".."z">
   ALPHA          = UPALPHA | LOALPHA
   DIGIT          = <any US-ASCII digit "0".."9">
   CTL            = <any US-ASCII control character
                    (octets 0 - 31) and DEL (127)>
   CR             = <US-ASCII CR, carriage return (13)>
   LF             = <US-ASCII LF, linefeed (10)>
   SP             = <US-ASCII SP, space (32)>
   HT             = <US-ASCII HT, horizontal-tab (9)>
   <">            = <US-ASCII double-quote mark (34)>

本节的其余部分包含有关协议头和其他元素的更具体信息。

你必须绕着规范跳很多,才能找到所有正确的 BNF 定义。不过,第 4.2 节包含标题的定义:

   message-header = field-name ":" [ field-value ]
   field-name     = token
   field-value    = *( field-content | LWS )
   field-content  = <the OCTETs making up the field-value
                    and consisting of either *TEXT or combinations
                    of token, separators, and quoted-string>

TEXT在第 2.2 节中定义:

   TEXT           = <any OCTET except CTLs,
                    but including LWS>
于 2012-04-27T19:42:35.083 回答