5

我想用 html 表单和 Rebol cgi 存储一些数据。我的表格如下所示:

<form action="test.cgi" method="post" >

     Input:

     <input type="text" name="field"/>
     <input type="submit" value="Submit" />

</form>

但是对于像中文这样的 unicode 字符,我会得到带有百分号的数据的编码形式,例如%E4%BA%BA.

(这是针对汉字“人”......它作为 Rebol 二进制文字的 UTF-8 形式是#{E4BABA}

系统中有没有可以直接解码的函数,或者现有的库? dehex目前似乎没有涵盖这种情况。我目前正在通过删除百分号并构建相应的二进制文件来手动解码,如下所示:

data: to-string read system/ports/input
print data

;-- this prints "field=%E4%BA%BA"

k-v: parse data "="
print k-v

;-- this prints ["field" "%E4%BA%BA"]

v: append insert replace/all k-v/2 "%" "" "#{" "}"
print v

;-- This prints "#{E4BABA}" ... a string!, not binary!
;-- LOAD will help construct the corresponding binary
;-- then TO-STRING will decode that binary from UTF-8 to character codepoints

write %test.txt to-string load v
4

2 回答 2

3

我有一个名为AltWebForm的库,它可以对百分比编码的 Web 表单数据进行编码/解码:

do http://reb4.me/r3/altwebform
load-webform "field=%E4%BA%BA"

该库在此处描述:Rebol 和 Web 表单

于 2013-08-20T15:47:40.797 回答
2

看起来与票证 #1986 相关,其中讨论了这是一个“错误”还是互联网从它自己的规范下改变:

让 DEHEX 将来自浏览器的 UTF-8 序列转换为 Unicode

如果你对什么已经成为中文标准有具体的经验,并且想参与进来,那将是很有价值的。

顺便说一句,上面的特定情况可以在 PARSE 中交替处理为:

key-value: {field=%E4%BA%BA}

utf8-bytes: copy #{}

either parse key-value [
    copy field-name to {=}
    skip
    some [
        and {%}
        copy enhexed-byte 3 skip (
            append utf8-bytes dehex enhexed-byte
        )
    ]
] [
    print [field-name {is} to string! utf8-bytes]
] [
    print {Malformed input.}
]

这将输出:

field is 人

包括一些评论:

key-value: {field=%E4%BA%BA}

;-- Generate empty binary value by copying an empty binary literal     
utf8-bytes: copy #{}

either parse key-value [

    ;-- grab field-name as the chars right up to the equals sign
    copy field-name to {=}

    ;-- skip the equal sign as we went up to it, without moving "past" it
    skip

    ;-- apply the enclosed rule SOME (non-zero) number of times
    some [
        ;-- match a percent sign as the immediate next symbol, without
        ;-- advancing the parse position
        and {%}

        ;-- grab the next three chars, starting with %, into enhexed-byte
        copy enhexed-byte 3 skip (

            ;-- If we get to this point in the match rule, this parenthesized
            ;-- expression lets us evaluate non-dialected Rebol code to 
            ;-- append the dehexed byte to our utf8 binary
            append utf8-bytes dehex enhexed-byte
        )
    ]
] [
    print [field-name {is} to string! utf8-bytes]
] [
    print {Malformed input.}
]

(另请注意,“简单解析”正在获得支持SPLIT 增强功能的斧头。因此,现在编写类似的代码parse data "="可以表示为split data "=",或者如果您检查它们,则可以表示其他很酷的变体......样本在票证中。)

于 2013-08-20T17:29:19.577 回答