3

我有一个相当简单的正则表达式,它在我的 Ruby 代码中运行良好,但拒绝在我的 Lisp 代码中运行。我只是想匹配一个 URL(斜线后跟一个单词,仅此而已)。这是我在 Ruby 中使用的正则表达式:^\/\w*$

我希望这个匹配"/""/foo"不匹配"/foo/bar"

我尝试了以下方法:

(cl-ppcre:scan "^/\w*$" "/") ;works
(cl-ppcre:scan "^/\w*$" "/foo") ;doesn't work!
(cl-ppcre:scan "^/\w*$" "/foo/bar") ;works, ie doesn't match

有人可以帮忙吗?

4

2 回答 2

9

The backslash (\) character is, by default, the single escape character: It prevents any special processing to be done to the character following it, so it can be used to include a double quote (") inside of a string literal like this "\"".

Thus, when you pass the literal string "^/\w*$" to cl-ppcre:scan, the actual string that is passed will be "^/w*$", i.e. the backslash will just be removed. You can verify this by evaluating (cl-ppcre:scan "^/\w*$" "/w"), which will match.

To include the backslash character in your regular expression, you need to quote it like so: "^/\\w*$".

If you work with literal regular expressions a lot, the required quoting of strings can become tedious and hard to read. Have a look at CL-INTERPOL for a library that adds a nicer syntax for regular expressions to the Lisp reader.

于 2018-12-27T07:27:47.880 回答
4

如果您对正则表达式有疑问,也可以使用以下方法进行检查ppcre:parse-string

CL-USER> (ppcre:parse-string "^/\w*$")
(:SEQUENCE :START-ANCHOR #\/ (:GREEDY-REPETITION 0 NIL #\w) :END-ANCHOR)

上面告诉我们反斜杠 -w被解释为文字w字符。

将此与您要使用的表达式进行比较:

CL-USER> (ppcre:parse-string "^/\\w*$")
(:SEQUENCE :START-ANCHOR #\/ (:GREEDY-REPETITION 0 NIL :WORD-CHAR-CLASS) :END-ANCHOR)

返回值是表示正则表达式的树。事实上,您可以在任何 CL-PPCRE 需要正则表达式的地方使用相同的表示。尽管它有点冗长,但这有助于将值组合到正则表达式中,而不必担心在字符串中嵌套字符串或特殊字符:

(defun maybe (regex)
  `(:greedy-repetition 0 1 ,regex))

(defparameter *simple-floats*
  (let ((digits '(:register (:greedy-repetition 1 nil :digit-class))))
    (ppcre:create-scanner `(:sequence
                             (:register (:regex "[+-]?"))
                             ,digits
                             ,(maybe `(:sequence "." ,digits))))))

在上面,点"."是按字面意思阅读的,而不是正则表达式。这意味着您可以使用纯字符串正则表达式中的转义字符匹配类似"(^.^)""[]"难以写入和读取的字符串。您可以使用表达式回退到作为字符串的正则(:regex "...")表达式。

CL-PPCRE 有一个优化,其中常量正则表达式在加载时使用load-time-value. 如果您的正则表达式不是普通常量,则可能不会应用该优化,因此您可能希望将自己的扫描仪包装在load-time-value表单中。只需确保您在加载时准备好足够的定义,例如辅助maybe功能。

于 2018-12-27T17:43:45.813 回答