rebol - 解析和字符集：为什么我的脚本不起作用

Question

我只想提取属性 1 和属性 3 值。我不明白为什么 charset 在我的情况下似乎无法“跳过”任何其他属性（未按我的意愿提取 attribute3）：

content: {<tag attribute1="valueattribute1" attribute2="valueattribute2" attribute3="valueattribute3">
</tag>
<tag attribute2="valueattribute21" attribute1="valueattribute11" >
</tag>
}


attribute1: [{attribute1="} copy valueattribute1 to {"} thru {"}]
attribute3: [{attribute3="} copy valueattribute3 to {"} thru {"}]

spacer: charset reduce [tab newline #" "]
letter: complement spacer 
to-space: [some letter | end]

attributes-rule: [(valueattribute1: none valueattribute3: none) [attribute1 | none] any letter [attribute3 | none] (print valueattribute1 print valueattribute3)
| [attribute3 | none] any letter [attribute1 | none] (print valueattribute3 print valueattribute1
valueattribute1: none valueattribute3: none
)
| none
]

rule: [any [to {<tag } thru {<tag } attributes-rule {>} to {</tag>} thru {</tag>}] to end]

parse content rule

输出是

>> parse content rule
valueattribute1
none
== true
>>

score 1 · Accepted Answer

首先，您没有使用parse/all. 在 Rebol 2 中，这意味着在解析运行之前已经有效地去除了空白。这在 Rebol 3 中并非如此：如果您的解析规则是块格式（正如您在此处所做的那样），那么/all就是隐含的。

（注意：似乎共识 Rebol 3 将抛弃非块形式的解析规则，支持split那些“最小”解析场景的功能。这将完全摆脱/all。尚未采取任何行动不幸的是，这个。）

其次，您的代码有错误，我不会花时间整理。（这主要是因为我认为使用 Rebol 的解析来处理 XML/HTML 是一个相当愚蠢的想法：P）

但不要忘记你有一个重要的工具。如果您在解析规则中使用设置字，那么这会将解析位置捕获到变量中。然后你可以打印出来看看你在哪里。改变attribute-rule你第一次说any letter的部分，pos: (print pos) any letter你会看到：

>> parse/all content rule
 attribute2="valueattribute2" attribute3="valueattribute3">
</tag>
<tag attribute2="valueattribute21" attribute1="valueattribute11" >
</tag>

valueattribute1
none
== true

看到领先的空间了吗？你的规则就在any letter把你放在一个空格之前......因为你说任何字母都可以，没有字母是好的，一切都被抛弃了。

（注意：Rebol 3 有一个更好的调试工具……这个词??。当你把它放在解析块中时，它会告诉你当前正在处理的令牌/规则以及输入的状态。使用这个工具，你可以更轻松地找出发生了什么：

>> parse "hello world" ["hello" ?? space ?? "world"]
space: " world"
"world": "world"
== true

...虽然它现在在 r3 mac intel 上确实有问题。）

此外，如果您不使用，copy那么您的模式to X thru X是不必要的，您可以使用thru X. 如果你想做一个副本，你也可以用简报来做，copy Y to X X或者如果它只是一个符号，你可以写得更清楚copy Y to X skip

在您看到自己在编写重复代码的地方，请记住 Rebol 可以通过使用composeetc 更上一层楼：

>> temp: [thru (rejoin [{attribute} num {=}]) 
          copy (to-word rejoin [{valueattribute} num]) to {"} thru {"}]

>> num: 1
>> attribute1: compose temp
== [thru "attribute1=" copy valueattribute1 to {"} thru {"}]

>> num: 2
>> attribute2: compose temp
== [thru "attribute2=" copy valueattribute2 to {"} thru {"}]

score 1 · Accepted Answer

简短的回答，[any letter] 吃掉你的 attribute3="..." 因为 #"^"" 字符根据你的定义是一个'字母。此外，你可能会遇到没有 attribute2 的问题，然后是你的通用第二个属性规则将吃attribute3，而您的attribute3规则将没有任何匹配项-最好明确说明有可选的attribute2或可选的anything-but-attribute3

attribute1="foo"       attribute2="bar" attribute3="foobar" 
<- attribute1="..." -> <-     any letter                 -> <- attibute3="..." ->

此外，没有 /all 细化的 'parse 会忽略空格（或者至少在涉及空格的地方非常笨拙） - 强烈建议将 /all 用于这种类型的解析。

score 0 · Accepted Answer

添加 parse/all 时，它似乎没有改变任何东西。最后这似乎可行（使用 set-word 确实对调试有很大帮助！！！），你怎么看？

content: {<tag attribute1="valueattribute1" attribute2="valueattribute2" attribute3="valueattribute3">
</tag>
<tag attribute2="valueattribute21" attribute1="valueattribute11" >
</tag>
}


attribute1: [to {attribute1="} thru {attribute1="} copy valueattribute1 to {"} thru {"}]
attribute3: [to {attribute3="} thru {attribute3="} copy valueattribute3 to {"} thru {"}]

letter: charset reduce ["ABCDEFGHIJKLMNOPQRSTUabcdefghijklmnopqrstuvwxyz1234567890="]

attributes-rule: [(valueattribute1: none valueattribute3: none) 
[attribute1 | none] any letter pos: 
[attribute3 | none] (print valueattribute1 print valueattribute3)
| [attribute3 | none] any letter [attribute1 | none] (print valueattribute3 print valueattribute1
valueattribute1: none valueattribute3: none
)
| none
]

rule: [any [to {<tag } thru {<tag } attributes-rule {>} to {</tag>} thru {</tag>}] to end]

parse content rule

输出：

>> parse/all content rule
valueattribute1
valueattribute3
valueattribute11
none
== true
>>

rebol - 解析和字符集：为什么我的脚本不起作用

3 回答 3

Related

Reference