2

I need to parse in javascript the value entered by the user in an html text field.

That is my first regexp experience.

Here is my code :

var s = 'research library "not available" author:"Bernard Shaw"';
var tableau = s.split(/(?:[^\s"]+|"[^"]*")/);
for (var i=0; i<tableau.length; i++) {
    document.write("tableau[" + i + "] = " + tableau[i] + "<BR>");
}

I am expecting to see something like this:

tableau[0] = research
tableau[1] = library
tableau[2] = "not available"
tableau[3] = author:
tableau[4] = "Bernard Shaw"

But instead I got this:

tableau[0] =
tableau[1] =
tableau[2] =
tableau[3] =
tableau[4] =
tableau[5] = 

Actually, what I really need is to split this value :

research library "not available" author:"Bernard Shaw"

into this array :

tableau[0] = research
tableau[1] = library
tableau[2] = "not available"
tableau[3] = author:"Bernard Shaw"

But I think there is a problem with positive lookbehind in javascript or something like this.

I did many tries without more success:

I think I really need some help...

4

3 回答 3

2

It seems like you want to split on the whitespace outside the double-quotes. In that case you can try this regex:

var tableau = s.split(/\s(?=(?:[^"]*"[^"]*")*[^"]*$)/);

this will split on whitespace, followed by an even number of double quotes.

Explanation:

\s          # Split on whitespace
(?=         # Followed by
   (?:      # Non-capture group with 2 quotes
     [^"]*  # 0 or more non-quote characters
     "      # 1 quote
     [^"]*  # 0 or more non-quote characters
     "      # 1 quote
   )*       # 0 or more repetition of previous group(multiple of 2 quotes will be even)
   [^"]*    # Finally 0 or more non-quotes
   $        # Till the end  (This is necessary)
)      

This will give you your final desired output:

tableau[0] = research
tableau[1] = library
tableau[2] = "not available"
tableau[3] = author:"Bernard Shaw"
于 2013-09-18T16:03:35.397 回答
0

Regex might not be the way to go. Instead, you might write a tiny parser that marches along a character at a time and builds an array. Something like this (http://jsfiddle.net/WTMct/1):

function parse(str) {
    var arr = [];
    var quote = false;  // true means we're inside a quoted field

    // iterate over each character, keep track of current field index (i)
    for (var i = c = 0; c < str.length; c++) {
        var cc = str[c], nc = str[c+1];  // current character, next character
        arr[i] = arr[i] || '';           // create a new array value (start with empty string) if necessary

        // If it's just one quotation mark, begin/end quoted field
        if (cc == '"') { quote = !quote; continue; }

        // If it's a space, and we're not in a quoted field, move on to the next field
        if (cc == ' ' && !quote) { ++i; continue; }

        // Otherwise, append the current character to the current field
        arr[i] += cc;
    }

    return arr;
}

Then

parse('research library "not available" author:"Bernard Shaw"')

returns ["research", "library", "not available", "author:Bernard Shaw"].

于 2013-09-18T16:13:29.100 回答
0

You can also match the string

var output=s.match(/"[^"]*"|\S+/g);
于 2013-09-18T16:17:40.713 回答