1

我正在学习 Rust 以及 nom crate。我有一个字符串,它可能是其中之一

* abc
* abc, test.txt
* abc, def, test.txt
* abc, test.txt, test.txt

我想编写一个解析器,将结束文件名和所有其他部分作为一个元组获取。所以对于上面的示例输入,预期的输出是

* abc                      -> ("abc",           "")
* abc, test.txt            -> ("abc",           "test.txt")
* abc, def, test.txt       -> ("abc, def",      "test.txt")
* abc, test.txt, test.txt  -> ("abc, test.txt", "test.txt")

以下代码是我现在所拥有的。


fn test(input: &str) -> (String, String) {
    let result: IResult<&str, _, nom::error::Error<&str>> =
        all_consuming(
            many_till(
                is_a("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789._,"),
                preceded(
                    tuple((space0, tag(","), space0)),
                    recognize(is_a("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789._")),
                )))(input);

    result.map(|(_, x)| {
        println!("{:?}", x);
        return (x.0.join(""), x.1.to_string());
    }).unwrap_or_default()
}


#[test]
fn test1() {
    assert_eq!(test("test\\test2"), ("test\\test2".to_string(), "".to_string()));
    assert_eq!(test("test2\\a.txt, file1"), ("test2\\a.txt".to_string(), "file1".to_string()));
    assert_eq!(test("abc"), ("abc".to_string(), "".to_string()));
    assert_eq!(test("abc, test.txt"), ("abc".to_string(), "test.txt".to_string()));
    assert_eq!(test("bc, def, test.txt"), ("abc, def".to_string(), "test.txt".to_string()));
    assert_eq!(test("abc, test.txt, test.txt"), ("abc, test.txt".to_string(), "test.txt".to_string()));
}

当我运行时cargo test,出现以下错误:

running 1 test
test test1 ... FAILED

failures:

---- test1 stdout ----
thread 'test1' panicked at 'assertion failed: `(left == right)`
  left: `("", "")`,
 right: `("test\\test2", "")`', src\main.rs:138:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    test1

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

问题出在哪里?谢谢。

4

1 回答 1

3

你真的不需nom要这样做。您可以在最后一个分隔符上拆分字符串:

fn split_std(input: &str) -> (String, String) {
    match input.rfind(", ") {
        None => (input.to_owned(), "".to_owned()),
        Some(idx) => (input[..idx].to_owned(), input[idx + 2..].to_owned()),
    }
}

使用 nom 执行此操作要复杂得多且速度慢得多。它将需要解析 a separated_list(),然后加入列表元素并处理不同的极端情况和错误:


fn split_nom(input: &str) -> (String, String) {
    let result: IResult<_, _, nom::error::Error<_>> =
        nom::combinator::all_consuming(nom::multi::separated_list0(
            tag(", "),
            nom::bytes::complete::take_while(|ch| {
                nom::character::is_alphabetic(ch as u8)
                    || nom::character::is_digit(ch as u8)
                    || ch == '_'
                    || ch == '\\'
                    || ch == '.'
            }),
        ))(input);

    match result {
        Err(_) => (input.to_owned(), "".to_owned()),
        Ok((_, parts)) => {
            if parts.len() == 1 {
                return (parts[0].to_owned(), "".to_owned());
            }

            (
                parts[..parts.len() - 1].join(", "),
                parts[parts.len() - 1].to_owned(),
            )
        }
    }
}

测试:

#[cfg(test)]
mod tests {
    use crate::{split_nom, split_std};

    #[test]
    fn test_split_std() {
        verify(split_std);
    }

    #[test]
    fn test_split_nom() {
        verify(split_nom);
    }

    fn verify(f: fn(&str) -> (String, String)) {
        assert_eq!(
            f("test\\test2"),
            ("test\\test2".to_string(), "".to_string())
        );
        assert_eq!(
            f("test2\\a.txt, file1"),
            ("test2\\a.txt".to_string(), "file1".to_string())
        );
        assert_eq!(f("abc"), ("abc".to_string(), "".to_string()));
        assert_eq!(
            f("abc, test.txt"),
            ("abc".to_string(), "test.txt".to_string())
        );
        assert_eq!(
            f("abc, def, test.txt"),
            ("abc, def".to_string(), "test.txt".to_string())
        );
        assert_eq!(
            f("abc, test.txt, test.txt"),
            ("abc, test.txt".to_string(), "test.txt".to_string())
        );
    }
}

PS:您的测试用例与您的代码不匹配 - 例如在测试用例中您有\符号,而它不在解析器的接受字符列表中。

于 2021-08-25T11:19:59.883 回答