ruby - 解析 \p{IsBasicLatin} 字符属性时 Ruby 中的 RegexpError

Question

我正在使用 JRuby 1.7.18，甚至在 JRuby 9000（最新版本）中尝试过这个，我得到了同样的错误。我正在使用soap-4randnokogiri库来解析 wsdl xml 文件。

当解析 wsdl 的以下部分时

<xs:pattern value="[\p{IsBasicLatin}]*"/>

我收到以下错误

RegexpError: (RegexpError) invalid character property name <IsBasicLatin>: /\A[\p{IsBasicLatin}]*\z/n
nokogiri/XmlSaxParserContext.java:252:in `parse_with'
nokogiri/XmlSaxParserContext.java:252:in `parse_with'
nokogiri/XmlSaxParserContext.java:252:in `parse_with'

在 Ruby 1.9 中，这是 JRuby 1.7.18 兼容的 Ruby 版本之一，我读到\p{IsBasicLatin}不支持字符块。但支持类似的脚本\p{Latin}。我尝试过更改IsBasicLatin，Latin甚至尝试过其他一些类似的InBasicLatin，InBasic_Latin但它们都返回相同的错误。

这在 JRuby 1.7.18 和 JRuby 9000 中都是最新版本。

这里出了什么问题，我该如何解决？

score 0 · Accepted Answer

正如评论中提到的，字符属性的名称实际上是In_Basic_Latin而不是IsBasicLatin。现代版本的 Ruby（特别是 MRI 或 CRuby）使用正则表达式库 Onigmo。官方 Ruby 文档没有列出所有 Unicode 属性，但幸运的是Onigmo 列出了。

显然，JRuby 似乎没有（至少）实现 Unicode 块。然而，关于区块的信息（名称和范围）是可以公开访问的。\p{In_Basic_Latin}因此等价于[\u0000-\u007F]。也是如此[[:ascii:]]。

ruby - 解析 \p{IsBasicLatin} 字符属性时 Ruby 中的 RegexpError

1 回答 1

Related

Reference