forms - 是否有一个函数可以像从表单中一样解码编码的 unicode utf-8 字符串？

Question

我想用 html 表单和 Rebol cgi 存储一些数据。我的表格如下所示：

<form action="test.cgi" method="post" >

     Input:

     <input type="text" name="field"/>
     <input type="submit" value="Submit" />

</form>

但是对于像中文这样的 unicode 字符，我会得到带有百分号的数据的编码形式，例如%E4%BA%BA.

（这是针对汉字“人”......它作为 Rebol 二进制文字的 UTF-8 形式是#{E4BABA}）

系统中有没有可以直接解码的函数，或者现有的库？ dehex目前似乎没有涵盖这种情况。我目前正在通过删除百分号并构建相应的二进制文件来手动解码，如下所示：

data: to-string read system/ports/input
print data

;-- this prints "field=%E4%BA%BA"

k-v: parse data "="
print k-v

;-- this prints ["field" "%E4%BA%BA"]

v: append insert replace/all k-v/2 "%" "" "#{" "}"
print v

;-- This prints "#{E4BABA}" ... a string!, not binary!
;-- LOAD will help construct the corresponding binary
;-- then TO-STRING will decode that binary from UTF-8 to character codepoints

write %test.txt to-string load v

score 3 · Accepted Answer

我有一个名为AltWebForm的库，它可以对百分比编码的 Web 表单数据进行编码/解码：

do http://reb4.me/r3/altwebform
load-webform "field=%E4%BA%BA"

该库在此处描述：Rebol 和 Web 表单。

score 2 · Accepted Answer

看起来与票证 #1986 相关，其中讨论了这是一个“错误”还是互联网从它自己的规范下改变：

让 DEHEX 将来自浏览器的 UTF-8 序列转换为 Unicode。

如果你对什么已经成为中文标准有具体的经验，并且想参与进来，那将是很有价值的。

顺便说一句，上面的特定情况可以在 PARSE 中交替处理为：

key-value: {field=%E4%BA%BA}

utf8-bytes: copy #{}

either parse key-value [
    copy field-name to {=}
    skip
    some [
        and {%}
        copy enhexed-byte 3 skip (
            append utf8-bytes dehex enhexed-byte
        )
    ]
] [
    print [field-name {is} to string! utf8-bytes]
] [
    print {Malformed input.}
]

这将输出：

field is 人

包括一些评论：

key-value: {field=%E4%BA%BA}

;-- Generate empty binary value by copying an empty binary literal     
utf8-bytes: copy #{}

either parse key-value [

    ;-- grab field-name as the chars right up to the equals sign
    copy field-name to {=}

    ;-- skip the equal sign as we went up to it, without moving "past" it
    skip

    ;-- apply the enclosed rule SOME (non-zero) number of times
    some [
        ;-- match a percent sign as the immediate next symbol, without
        ;-- advancing the parse position
        and {%}

        ;-- grab the next three chars, starting with %, into enhexed-byte
        copy enhexed-byte 3 skip (

            ;-- If we get to this point in the match rule, this parenthesized
            ;-- expression lets us evaluate non-dialected Rebol code to 
            ;-- append the dehexed byte to our utf8 binary
            append utf8-bytes dehex enhexed-byte
        )
    ]
] [
    print [field-name {is} to string! utf8-bytes]
] [
    print {Malformed input.}
]

（另请注意，“简单解析”正在获得支持SPLIT 增强功能的斧头。因此，现在编写类似的代码parse data "="可以表示为split data "="，或者如果您检查它们，则可以表示其他很酷的变体......样本在票证中。）

forms - 是否有一个函数可以像从表单中一样解码编码的 unicode utf-8 字符串？

2 回答 2

Related

Reference