0

我不太熟悉正则表达式,所以我需要一些帮助。我正在使用一个jQuerydynacloud 插件,当发生正则表达式匹配时,该插件会在我的代码中的某个识别点中断。我需要有人帮我弄清楚这个正则表达式匹配什么

/^[a-z\xE4\xF6\xFC]*[A-Z\xC4\xD6\xDC]([A-Z\xC4\xD6\xDC\xDF]+|[a-z\xE4\xF6\xFC\xDF]{3,}

任何帮助请!

4

5 回答 5

1

我建议你看看Expresso,因为你错过了右括号,结果如下:

在此处输入图像描述

于 2012-08-14T09:34:03.913 回答
1

^行首

[...]一类可能的字符

a-z范围(abcde...yz)

\xE4char 的十六进制值(“ascii”代码)。

{n,m}在 n 和 m 次出现之间。

*相当于 {0,}

+相当于 {1,}

于 2012-08-14T09:35:57.423 回答
1

如果你替换那些你基本上得到的\x**部分,那么这些部分会转化为一个特殊的字符:

/^[a-zäöü]*[A-ZÄÖÜ]([A-ZÄÖÜß]+|[a-zäöüß]{3,})/

我给你拆开看:

^字符串的开头

[a-zäöü]字符集:从 a 到 z 或 äöü*零次或多次的任何字符

[A-ZÄÖÜ]字符集:从 A 到 Z 或 ÄÖÜ 的任何字符仅一次

(小组开始

[A-ZÄÖÜß]另一个字符集,你现在应该得到它:)+一次或多次

|或者

[a-zäöüß]字符集,{3,}3 次或更多次

)组结束

另外,您最后错过了 a )//开头和结尾的意思是介于两者之间的是正则表达式。

于 2012-08-14T09:36:28.923 回答
0

假设这是您的正则表达式:

/^[a-z\xE4\xF6\xFC]*[A-Z\xC4\xD6\xDC]([A-Z\xC4\xD6\xDC\xDF]+|[a-z\xE4\xF6\xFC\xDF]{3,})/

以下是对正则表达式的解释:

"^" +                              // Assert position at the beginning of a line (at beginning of the string or after a line break character)
"[a-z\xE4\xF6\xFC]" +              // Match a single character present in the list below
                                      // A character in the range between “a” and “z”
                                      // ASCII character 0xE4 (228 decimal)
                                      // ASCII character 0xF6 (246 decimal)
                                      // ASCII character 0xFC (252 decimal)
   "*" +                              // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"[A-Z\xC4\xD6\xDC]" +              // Match a single character present in the list below
                                      // A character in the range between “A” and “Z”
                                      // ASCII character 0xC4 (196 decimal)
                                      // ASCII character 0xD6 (214 decimal)
                                      // ASCII character 0xDC (220 decimal)
"(" +                              // Match the regular expression below and capture its match into backreference number 1
                                      // Match either the regular expression below (attempting the next alternative only if this one fails)
      "[A-Z\xC4\xD6\xDC\xDF]" +          // Match a single character present in the list below
                                            // A character in the range between “A” and “Z”
                                            // ASCII character 0xC4 (196 decimal)
                                            // ASCII character 0xD6 (214 decimal)
                                            // ASCII character 0xDC (220 decimal)
                                            // ASCII character 0xDF (223 decimal)
         "+" +                              // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   "|" +                              // Or match regular expression number 2 below (the entire group fails if this one fails to match)
      "[a-z\xE4\xF6\xFC\xDF]" +          // Match a single character present in the list below
                                            // A character in the range between “a” and “z”
                                            // ASCII character 0xE4 (228 decimal)
                                            // ASCII character 0xF6 (246 decimal)
                                            // ASCII character 0xFC (252 decimal)
                                            // ASCII character 0xDF (223 decimal)
         "{3,}" +                           // Between 3 and unlimited times, as many times as possible, giving back as needed (greedy)
")"  
于 2012-08-14T12:18:27.163 回答
0

我假设)/正则表达式中缺少的只是您的剪切粘贴错误;它们存在于DynaCloud 源代码中。存在的是结束锚 ( ) $,我觉得这很令人惊讶。以下是相关代码:

var elems = jQuery(this).text()
            .replace(/[^A-Z\xC4\xD6\xDCa-z\xE4\xF6\xFC\xDF0-9_]/g, ' ')
            .replace(jQuery.dynaCloud.stopwords, ' ')
            .split(' ');
var word = 
  /^[a-z\xE4\xF6\xFC]*[A-Z\xC4\xD6\xDC]([A-Z\xC4\xD6\xDC\xDF]+|[a-z\xE4\xF6\xFC\xDF]{3,})/;

第一条语句过滤掉不需要的字符,但只留下数字和下划线。第二条语句尝试匹配由 ASCII 字母加上一些在(例如)德语中使用的非 ASCII 字母组成的单词。但是,一旦匹配的字母用完,就可以继续匹配任何字符,而不仅仅是第一个正则表达式中列出的字符。此外,单词中的任何数字或下划线都会导致该单词被分成两个或多个单词。

我会尝试在末尾锚定正则表达式并添加对数字和下划线的支持,如下所示:

/^[a-z\xE4\xF6\xFC]*[A-Z\xC4\xD6\xDC]([A-Z\xC4\xD6\xDC\xDF0-9_]+|[a-z\xE4\xF6\xFC\xDF0-9_]{3,})$/g

此正则表达式仅用于说明目的;它不是一个解决方案。一方面,我对数字和下划线的位置做了一个疯狂的猜测。另一方面,它现在可以匹配以数字和下划线结尾的单词,而您可能不希望这样。

于 2012-08-14T12:47:18.420 回答