1

我正在尝试寻找一个正则表达式(Coldfusion 或 Java),它可以让我正确地获取每个(参数 \d+)括号之间的内容。我已经尝试了几十种不同类型的正则表达式,我得到的最接近的是这个:

\(param \d+\) = \[(type='[^']*', class='[^']*', value='(?:[^']|'')*', sqltype='[^']*')\]

如果我从 CF 返回的字符串从 value 参数中转义了单引号,那将是完美的。但事实并非如此,它惨遭失败。像这样走负面的前瞻路线:

\[(type='[^']*', class='[^']*', value='(?:(?!', sqltype).)*', sqltype='[^']*')\]

很棒,除非出于某种不自然的原因,有一段代码确实具有, sqltype价值。我很难相信我不能简单地告诉 regex 挖出它找到的每个开括号和闭括号的内容,但话又说回来,我不知道足够的正则表达式来了解它的限制。

这是我要解析的示例字符串:

(param 1) = [type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer'] , (param 2) = [type='IN', class='java.lang.String', value='asf , O'Reilly, really?', sqltype='cf_sql_varchar'] , (param 3) = [type='IN', class='java.lang.String', value='Th[is]is Ev'ery'thing That , []can break it ', sqltype= ', sqltype='cf_sql_varchar']

对于好奇的人,这是Copyable Coldfusion SQL Exception的一个子问题。

编辑

这是我在 CF9.1 中实现@Mena 的答案的尝试。遗憾的是,它没有完成对字符串的处理。我不得不替换\\\只是为了让它一开始运行,但我的实现可能仍然有问题。

这是给定的字符串(管道仅表示边界):

| (param 1) = [type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer'] , (param 2) = [type='IN', class='java.lang.String', value='asf , O'Reilly], really?', sqltype='cf_sql_varchar'] , (param 3) = [type='IN', class='java.lang.String', value='Th[is]is Ev'ery'thing That , []can break it ', sqltype ', sqltype='cf_sql_varchar'] | 

这是我的实现:

    <cfset var outerPat = createObject("java","java.util.regex.Pattern").compile(javaCast("string", "\((.+?)\)\s?\=\s?\[(.+?)\](\s?,|$)"))>
    <cfset var innerPat = createObject("java","java.util.regex.Pattern").compile(javaCast("string", "(.+?)\s?\=\s?'(.+?)'\s?,\s?"))>
    <cfset var outerMatcher = outerPat.matcher(javaCast("string", arguments.params))>

    <cfdump var="Start"><br />
    <cfloop condition="outerMatcher.find()">     
        <cfdump var="#outerMatcher.group(1)#"> (<cfdump var="#outerMatcher.group(2)#">)<br />
        <cfset var innerMatcher = innerPat.matcher(javaCast("string", outerMatcher.group(2)))>
        <cfloop condition="innerMatcher.find()">
            <cfoutput>|__</cfoutput><cfdump var="#innerMatcher.group(1)#"> --> <cfdump var="#innerMatcher.group(2)#"><br />
        </cfloop>
        <br />
    </cfloop>
    <cfabort>

这就是打印的内容:

Start 
param 1 ( type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer' )
|__ type --> IN 
|__ class --> java.lang.Integer 
|__ value --> 47 

param 2 ( type='IN', class='java.lang.String', value='asf , O'Reilly )
|__ type --> IN 
|__ class --> java.lang.String 

End
4

2 回答 2

2

这是适用于您的示例输入的 Java 正则表达式模式。

(?x)

# lookbehind to check for start of string or previous param
# java lookbehinds must have max length, so limits sqltype
(?<=^|sqltype='cf_sql_[a-z]{1,16}']\ ,\ )

# capture the full string for replacing in the orig sql
# and just the position to verify against the match position
(\(param\ (\d+)\))

\ =\ \[

# type and class wont contain quotes
   type='([^']++)'
,\ class='([^']++)'

# match any non-quote, then lazily keep going
,\ value='([^']++.*?)'

# sqltype is always alphanumeric
,\ sqltype='cf_sql_[a-z]+'

\]

# lookahead to check for end of string or next param
(?=$|\ ,\ \(param\ \d+\)\ =\ \[)

(该(?x)标志用于注释模式,它忽略未转义的空格以及散列和行尾之间的空格。)

这是在 CFML 中实现的模式(在 CF9,0,1,274733 上测试)。它使用cfRegex(一个使在 CFML 中使用 Java regex 变得更容易的库)来获取该模式的结果,然后进行一些检查以确保找到预期的参数数量。

<cfsavecontent variable="Input">
(param 1) = [type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer']
 , (param 2) = [type='IN', class='java.lang.String', value='asf , O'Reilly, really?', sqltype='cf_sql_varchar']
 , (param 3) = [type='IN', class='java.lang.String', value='Th[is]is Ev'ery'thing That , []can break it ', sqltype= ', sqltype='cf_sql_varchar']
</cfsavecontent>
<cfset Input = trim(Input).replaceall('\n','')>

<cfset cfcatch = 
    { params = input
    , sql = 'SELECT stuff FROM wherever WHERE (param 3) is last param'
    }/>

<cfsavecontent variable="ParamRx">(?x)

    # lookbehind to check for start or previous param
    # java lookbehinds must have max length, so limits sqltype
    (?<=^|sqltype='cf_sql_[a-z]{1,16}']\ ,\ )

    # capture the full string for replacing in the orig sql
    # and just the position to verify against the match position
    (\(param\ (\d+)\))

    \ =\ \[

    # type and class wont contain quotes
       type='([^']++)'
    ,\ class='([^']++)'

    # match any non-quote, then lazily keep going if needed
    ,\ value='([^']++.*?)'

    # sqltype is always alphanumeric
    ,\ sqltype='cf_sql_[a-z]+'

    \]

    # lookahead to check for end or next param
    (?=$|\ ,\ \(param\ \d+\)\ =\ \[)

</cfsavecontent>

<cfset FoundParams = new Regex(ParamRx).match
    ( text = cfcatch.params
    , returntype = 'full'
    )/>

<cfset LastParamPos = cfcatch.sql.lastIndexOf('(param ') + 7 />
<cfset LastParam = ListFirst( Mid(cfcatch.sql,LastParamPos,3) , ')' ) />

<cfif LastParam NEQ ArrayLen(FoundParams) >
    <cfset ProblemsDetected = true />
<cfelse>
    <cfset ProblemsDetected = false />

    <cfloop index="i" from=1 to=#ArrayLen(FoundParams)# >

        <cfif i NEQ FoundParams[i].Groups[2] >
            <cfset ProblemsDetected = true />
        </cfif>

    </cfloop>
</cfif>

<cfif ProblemsDetected>
    <big>Something went wrong!</big>
<cfelse>
    <big>All seems fine</big>
</cfif>

<cfdump var=#FoundParams# />

如果您将整个参数嵌入到另一个参数的值中,这实际上会起作用。如果您尝试两个(或更多)它会失败,但至少检查应该检测到这个失败。

转储输出应如下所示:

转储输出

希望这里的一切都有意义 - 如果有任何问题,请告诉我。

于 2013-08-09T19:45:56.873 回答
0

我可能会为此使用专用的解析器,但这里有一个关于如何使用两个Patterns 和嵌套循环的示例:

// the input String
String input = "(param 1) = " +
        "[type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer'] , " +
        "(param 2) = " +
        "[type='IN', class='java.lang.String', value='asf , O'Reilly, really?', " +
        "sqltype='cf_sql_varchar'] , " +
        "(param 3) = " +
        "[type='IN', class='java.lang.String', value='Th[is]is Ev'ery'thing That , "                "[]can break it ', sqltype= ', sqltype='cf_sql_varchar']";

// the Pattern defining the round-bracket expression and the following 
// square-bracket list. Both values within the brackets are grouped for back-reference
// note that what prevents the 3rd case from breaking is that the closing square bracket 
// is expected to be either followed by optional space + comma, or end of input
Pattern outer = Pattern.compile("\\((.+?)\\)\\s?\\=\\s?\\[(.+?)\\](\\s?,|$)");

// the Pattern defining the key-value pairs within the square-bracket groups
// note that both key and value are grouped for back-reference
Pattern inner = Pattern.compile("(.+?)\\s?\\=\\s?'(.+?)'\\s?,\\s?");
Matcher outerMatcher = outer.matcher(input);
// iterating over the outer Pattern (type x) = [myKey = myValue, ad lib.], or end of input
while (outerMatcher.find()) {
    System.out.println(outerMatcher.group(1));
    Matcher innerMatcher = inner.matcher(outerMatcher.group(2));
    // iterating over the inner Pattern myKey = myValue
    while (innerMatcher.find()) {
        System.out.println("\t" + innerMatcher.group(1) + " --> " + innerMatcher.group(2));
    }
}

输出:

param 1
    type --> IN
    class --> java.lang.Integer
    value --> 47
param 2
    type --> IN
    class --> java.lang.String
    value --> asf , O'Reilly, really?
param 3
    type --> IN
    class --> java.lang.String
    value --> Th[is]is Ev'ery'thing That , []can break it 
于 2013-08-09T17:53:47.663 回答