0

我想知道是否可以修改使用的通配符表达式*并将?其转换为正则表达式以验证它是否与某些字符串匹配。

换句话说,如果我*bl?e*在这些字符串上使用过滤器(不区分大小写):

["Blue", "Black", "Red", "Light blue", "Light black"]

我想得到:

["Blue, "Light blue"].

有人知道如何用正则表达式做到这一点吗?除了使用正则表达式之外,还有更好的方法吗?

添加以更好地阐明我的想法...

行!...与往常一样,我以为我提出了一个非常明确的问题,并通过答案意识到我完全搞砸了我的问题。我想做一个函数,它会根据与 dos ('*' '?') 相同的规则的表达式(作为我的函数的参数)过滤集合。我认为使用正则表达式是个好主意。我是对的吗,正则表达式是什么?另外......我正在使用 C#,我想知道我是否无法访问任何可以直接完成这项工作的东西?

我还看看(相当好的答案)如何在 ac# regex 语句中指定通配符(对于任何字符)?

我终于在 .net Patterns and Practices 库中使用了 Glob 类。

但作为参考,这是我将 Glob exp 转换为 RegEx 的代码:

using System.Text;
using System.Text.RegularExpressions;

namespace HQ.Util.General
{
    public class RegexUtil
    {
        public const string RegExMetaChars = @"*?(){}[]+-^$.|\"; // Do not change the order. Algo depends on it (2 first chars should be dos like wildcard char)

        // ******************************************************************
        /// <summary>
        /// Convert an filter expression with '*' (wildcard any char) and '?' (wildcard on char) into a valid regex and
        /// strip any special regex character
        /// </summary>
        /// <param name="dosLikeExpressionFilter"></param>
        /// <returns></returns>
        public static string DosLikeExpressionFilterToRegExFilterExpression(string dosLikeExpressionFilter)
        {
            StringBuilder regex = new StringBuilder();
            regex.Append("(?i)"); // Case insensitive

            int startIndex = 0;
            int count = dosLikeExpressionFilter.Length;
            while (startIndex < count)
            {
                int metaIndex = RegExMetaChars.IndexOf(dosLikeExpressionFilter[startIndex]);
                if (metaIndex >= 0)
                {
                    if (metaIndex == 0)
                    {
                        regex.Append(".*");
                    }
                    else if (metaIndex == 1)
                    {
                        regex.Append(".");
                    }
                    else
                    {
                        regex.Append("\\");
                        regex.Append(dosLikeExpressionFilter[startIndex]);
                    }
                }
                else
                {
                    regex.Append(dosLikeExpressionFilter[startIndex]);
                }
                startIndex++;
            }

            return regex.ToString();
        }

        // ******************************************************************
        /// <summary>
        /// See 'DosLikeExpressionFilterToRegExFilterExpression' description to see what type of Regex is returned
        /// </summary>
        /// <param name="dosLikeExpressionFilter"></param>
        /// <returns></returns>
        public static Regex DosLikeExpressionFilterToRegEx(string dosLikeExpressionFilter)
        {
            return new Regex(DosLikeExpressionFilterToRegExFilterExpression(dosLikeExpressionFilter));
        }

        // ******************************************************************
    }
}
4

4 回答 4

2
               Any single character    Any number of characters   Character range
Glob syntax            ?                           *                    [0-9]
Regex syntax           .                           .*                   [0-9]

所以Bl?e(glob) 变成Bl.e(regex),然后*Bl?e*变成.*Bl.e.*.

正如乔伊正确指出的那样,您可以(通常,取决于正则表达式引擎)(?i)在您的正则表达式前面添加以使其不区分大小写。

但是请注意,许多在通配模式中没有特殊含义的字符在正则表达式中确实具有特殊含义,因此您不能只是从 glob 到正则表达式进行简单的搜索和替换。

于 2012-05-11T20:21:12.300 回答
1

需要解决相同的问题(使用用户输入中的 * 和 ? 通配符模式来过滤任意字符串列表),但扩展名可能还包括要搜索的转义星号或问号。

由于 SQL LIKE 运算符(这些通配符是 % 和 _)通常提供反斜杠以进行转义,因此我采用了相同的方法。这使使用 Regex.Escape() 并将 * 替换为 .* 和 ? 和 。使用正则表达式(请参阅该问题的许多其他答案)。

以下代码概述了为某些通配符模式提供正则表达式的方法。它被实现为 C# 字符串的扩展方法。文档标签和注释应该完整地解释代码:

using System.Text.RegularExpressions;

public static class MyStringExtensions
{
    /// <summary>Interpret this string as wildcard pattern and create a corresponding regular expression. 
    /// Rules for simple wildcard matching are:
    /// * Matches any character zero or more times.
    /// ? Matches any character exactly one time.
    /// \ Backslash can be used to escape above wildcards (and itself) for an explicit match,
    /// e.g. \* would then match a single star, \? matches a question mark and \\ matches a backslash.
    /// If \ is not followed by star, question mark or backslash it also matches a single backslash.
    /// Character set matching (by use of rectangular braces []) is NOT used and regarded in this implementation.
    /// </summary>
    /// <param name="wildPat">This string to be used as wildcard match pattern.</param>
    /// <param name="caseSens">Optional parameter for case sensitive matching - default is case insensitive.</param>
    /// <returns>New instance of a regular expression performing the requested matches.
    /// If input string is null or empty, null is returned.</returns>
    public static Regex CreateWildcardRegEx(this string wildPat, bool caseSens = false)
    {
        if (string.IsNullOrEmpty(wildPat))
           return null;

        // 1. STEP: Escape all special characters used in Regex later to avoid unwanted behavior.
        // Regex.Escape() prepends a backslash to any of following characters: \*+?|{[()^$.# and white space 
        wildPat = Regex.Escape(wildPat);

        // 2. STEP: Replace all three possible occuring escape sequences defined for our 
        // wildcard pattern with temporary sub strings that CANNOT exist after 1. STEP anymore.
        // Prepare some constant strings used below - @ in C# makes literal strings really literal - a backslash needs not be repeated!
        const string esc    = @"\\";    // Matches a backslash in a Regex
        const string any    = @"\*";    // Matches a star in a Regex
        const string sgl    = @"\?";    // Matches a question mark in a Regex
        const string tmpEsc = @"||\";   // Instead of doubled | any character Regex.Escape() escapes would do (except \ itself!)
        const string tmpAny =  "||*";   // See comment above
        const string tmpSgl =  "||?";   // See comment above

        // Watch that string.Replace() in C# will NOT stop replacing after the first match but continues instead...
        wildPat = wildPat.Replace(Regex.Escape(esc), tmpEsc)
                         .Replace(Regex.Escape(any), tmpAny)
                         .Replace(Regex.Escape(sgl), tmpSgl);

        // 3. STEP: Substitute our (in 1. STEP escaped) simple wildcards with the Regex counterparts.
        const string regAny = ".*";             // Matches any character zero or more times in a Regex
        wildPat = wildPat.Replace(any, regAny)
                         .Replace(sgl, ".");    // . matches any character in a Regex

        // 4. STEP: Revert the temporary replacements of 2. STEP (in reverse order) and replace with what a Regex really needs to match
        wildPat = wildPat.Replace(tmpSgl, sgl)
                         .Replace(tmpAny, any)
                         .Replace(tmpEsc, esc);

        // 5. STEP: (Optional, for performance) - Simplify multiply occuring * wildcards (cases of ******* or similar)
        // Replace with the regAny string - Use a single Regex.Replace() instead of string.Contains() with string.Replace() in a while loop 
        wildPat = Regex.Replace(wildPat, @"(\.\*){2,}", regAny);

        // 6. STEP: Finalize the Regex with begin and end line tags
        return new Regex('^' + wildPat + '$', caseSens ? RegexOptions.None : RegexOptions.IgnoreCase);

        // 2. and 4. STEP would be obsolete if we don't wanted to have the ability to escape * and ? characters for search
    }
}
于 2021-05-14T12:52:41.417 回答
0

试试这个正则表达式:

^([\w,\s]*bl\we[\w,\s]*) 

它基本上可以识别任何一组单词和空格,其中包含一个以“bl”开头并以“e”结尾且中间有一个字符的单词。或者

^([\w,\s]*bl(\w+)e[\w,\s]*)

如果您想识别任何以“bl”开头并以“e”结尾的单词。

另一种选择是对字符串使用一些不精确的匹配算法。不确定这是否正是您正在寻找的。

于 2012-05-11T20:27:03.640 回答
0

作为参考...我实际上使用了该代码:

using System.Text;
using System.Text.RegularExpressions;

namespace HQ.Util.General
{
    /*
        Usage:

           _glob = new FilterGlob(filterExpression, _caseSensitive);            


            public bool IsMatch(string s)
            {
                return _glob.IsMatch(s);
            }
    */


    /// <summary>
    /// Glob stand for: Pattern matching. Supported character are "?" and "*".
    /// </summary>
    public class FilterGlob
    {
        private readonly Regex pattern;

        /// <summary>
        /// Constructs a new <see cref="T:Microsoft.Practices.Unity.InterceptionExtension.Glob"/> instance that matches the given pattern.
        /// 
        /// </summary>
        /// <param name="pattern">The pattern to use. See <see cref="T:Microsoft.Practices.Unity.InterceptionExtension.Glob"/> summary for
        ///             details of the patterns supported.</param><param name="caseSensitive">If true, perform a case sensitive match.
        ///             If false, perform a case insensitive comparison.</param>
        public FilterGlob(string pattern, bool caseSensitive = true)
        {
            this.pattern = FilterGlob.GlobPatternToRegex(pattern, caseSensitive);
        }

        /// <summary>
        /// Checks to see if the given string matches the pattern.
        /// 
        /// </summary>
        /// <param name="s">String to check.</param>
        /// <returns>
        /// True if it matches, false if it doesn't.
        /// </returns>
        public bool IsMatch(string s)
        {
            return this.pattern.IsMatch(s);
        }

        private static Regex GlobPatternToRegex(string pattern, bool caseSensitive)
        {
            StringBuilder stringBuilder = new StringBuilder(pattern);
            string[] strArray = new string[9]
            {
                "\\",
                ".",
                "$",
                "^",
                "{",
                "(",
                "|",
                ")",
                "+"
            };

            foreach (string oldValue in strArray)
            {
                stringBuilder.Replace(oldValue, "\\" + oldValue);
            }

            stringBuilder.Replace("*", ".*");
            stringBuilder.Replace("?", ".");
            stringBuilder.Insert(0, "^");
            stringBuilder.Append("$");

            RegexOptions options = caseSensitive ? RegexOptions.None : RegexOptions.IgnoreCase;

            return new Regex(((object)stringBuilder).ToString(), options);
        }

    }
}
于 2016-02-01T15:31:58.807 回答