24

我编写了用正则表达式解析文本的程序。正则表达式应从用户处获取。我打算对用户输入使用 glob 语法,并在内部将 glob 字符串转换为正则表达式。例如:

"foo.? bar*" 

应转换为

"^.*foo\.\w\bar\w+.*"

不知何故,我需要从字符串中转义所有有意义的字符,然后我需要替换 glob * 和 ? 具有适当正则表达式语法的字符。最方便的方法是什么?

4

6 回答 6

61

不需要不完整或不可靠的黑客攻击。为此,python 包含一个函数

>>> import fnmatch
>>> fnmatch.translate( '*.foo' )
'.*\\.foo$'
>>> fnmatch.translate( '[a-z]*.txt' )
'[a-z].*\\.txt$'
于 2009-10-12T17:07:01.437 回答
3

I'm not sure I fully understand the requirements. If I assume the users want to find text "entries" where their search matches then I think this brute way would work as a start.

First escape everything regex-meaningful. Then use non-regex replaces for replacing the (now escaped) glob characters and build the regular expression. Like so in Python:

regexp = re.escape(search_string).replace(r'\?', '.').replace(r'\*', '.*?')

For the search string in the question, this builds a regexp that looks like so (raw):

foo\..\ bar.*?

Used in a Python snippet:

search = "foo.? bar*"
text1 = 'foo bar'
text2 = 'gazonk foo.c bar.m m.bar'

searcher = re.compile(re.escape(s).replace(r'\?', '.').replace(r'\*', '.*?'))

for text in (text1, text2):
  if searcher.search(text):
    print 'Match: "%s"' % text

Produces:

Match: "gazonk foo.c bar.m m.bar"

Note that if you examine the match object you can find out more about the match and use for highlighting or whatever.

Of course, there might be more to it, but it should be a start.

于 2009-01-15T09:47:16.547 回答
1

jPaq 的 RegExp.fromWildExp 函数做了类似的事情。以下内容来自网站首页上的示例:

// Find a first substring that starts with a capital "C" and ends with a
// lower case "n".
alert("Where in the world is Carmen Sandiego?".findPattern("C*n"));

// Finds two words (first name and last name), flips their order, and places
// a comma between them.
alert("Christopher West".replacePattern("(<*>) (<*>)", "p", "$2, $1"));

// Finds the first number that is at least three numbers long.
alert("2 to the 64th is 18446744073709551616.".findPattern("#{3,}", "ol"));
于 2011-03-14T16:48:04.507 回答
1

Jakarta ORO有一个Java 实现

于 2009-01-15T07:44:46.713 回答
1

我编写自己的函数,使用 c++ 和 boost::regex

std::string glob_to_regex(std::string val)
{
    boost::trim(val);
    const char* expression = "(\\*)|(\\?)|([[:blank:]])|(\\.|\\+|\\^|\\$|\\[|\\]|\\(|\\)|\\{|\\}|\\\\)";
    const char* format = "(?1\\\\w+)(?2\\.)(?3\\\\s*)(?4\\\\$&)";
    std::stringstream final;
    final << "^.*";
    std::ostream_iterator<char, char> oi(final);
    boost::regex re;
    re.assign(expression);
    boost::regex_replace(oi, val.begin(), val.end(), re, format, boost::match_default | boost::format_all);
    final << ".*" << std::ends;
    return final.str();
}

看起来一切正常

于 2009-01-15T08:16:17.150 回答
0

在 Rglob2rx中,基本分布中包含以下函数:

http://stat.ethz.ch/R-manual/R-devel/library/utils/html/glob2rx.html

于 2015-06-10T06:16:10.770 回答