2

I am trying to use Nom 6.1.2 to parse a relatively simple Lisp like language, where I need to capture identifiers of the form [a-z][a-zA-Z0-9_\-\.]. I tried to use re_match but this is expecting the whole input to match not just the first part of the string. I want to be able to match these identifiers as part of a larger context, so I want it to return the remaining part of the input to be passed on to other parsers, in a parser combinator fashion.

fn name(input: &str) -> IResult<&str, &str, VerboseError<&str>> {
    let re = Regex::new(r"^[A-Za-z][a-zA-Z0-9_\.\-]*$").unwrap();
    context("name", re_match(re))(input)
}

The test I want this to pass is the following:

#[test]
fn test_name() {
    assert_eq!(name("test"), Ok(("", "test")));
    assert_eq!(name("test1-test2"), Ok(("", "test1-test2")));
    assert_eq!(name("test1.test2"), Ok(("", "test1.test2")));
    assert_eq!(name("test1_test2"), Ok(("", "test1_test2")));
    assert_eq!(name("Test1_Test2"), Ok(("", "Test1_Test2")));
    assert!(name("123Test").is_err());

    //this last assertion fails
    assert_eq!(name("test1 test2$!%"), Ok((" test2$!%", "test1")));
}

The last assertion in the above test fails.

thread 'parser::tests::test_name' panicked at 'assertion failed: `(left == right)`
  left: `Err(Error(VerboseError { errors: [("test1 test2$!%", Nom(RegexpMatch)), ("test1 test2$!%", Context("name"))] }))`,
 right: `Ok((" test2$!%", "test1"))`', src\parser.rs:69:9
stack backtrace:

If I was using something like the alphanumeric function this would work fine, but this doesn't capture the regular expression I want.

I could hand code this myself and somehow check the first character and the subsequent ones, but I have several other situations where I need to parse different regular expressions, and it will become unmaintainable.

What is the right way to take the part that matches a regular expression and continue with parsing the rest of the larger input?

4

1 回答 1

2

I tried this out and part of the problem is with the $ at the end of the regex. That will tell the regex matcher to match the whole input up until the end, or else there will be no match.

The other issue is with re_match(). From the nom docs, re_match will return the whole input if a match is found (regardless of how many characters matched). What you want is the re_find() function, which will return the first match.

于 2021-03-16T18:31:13.543 回答