1

假设您有数千个按以下方式组织的文件:首先按文件名对它们进行排序(区分大小写,以便大写文件排在小写之前),然后将它们分组到包含第一个名称和该文件夹中的最后一个文件。例如,文件夹可能如下所示:

Abel -> Cain
Camel -> Sloth
Stork -> basket
basking -> sleuth
tiger -> zebra

现在,给定一个不区分大小写的搜索字符串s,确定哪些文件夹可以包含匹配的文件s。您不能也不必查看文件夹内部 - 该文件实际上不必存在。

一些例子:

("Abel", "Cain")    matches s = "blue",   since it contains "Blue"
("Stork", "basket") matches s = "arctic", since it contains "arctic"
("FA", "Fb")        matches s = "foo",    since it contains "FOo"
("Fa", "Fb") does NOT match s = "foo"

正式地:给定一个封闭的范围[a,b]和一个小写的字符串s,确定是否有任何字符串c[a,b]这样的lower(c) = s

我的第一个预感是对范围的边界进行不区分大小写的搜索。但是从上一个例子中很容易看出这是不正确的。

强力解决方案是生成所有潜在的文件名。例如,输入字符串"abc"将产生候选者"ABC", "ABc", "AbC", "Abc", "aBC", "aBc", "abC", "abc"。然后,您只需针对界限进行测试。下面是这个蛮力解决方案的一个例子。这是O(2^n)虽然。

我的问题是,是否有一种既快速又正确的算法?


Clojure 中的蛮力解决方案:

(defn range-contains 
  [first last string]
  (and (<= (compare first string) 0)
       (>= (compare last string) 0)))

(defn generate-cases
  "Generates all lowercase/uppercase combinations of a word"
  [string]
  (if (empty? string)
    [nil]
    (for [head [(java.lang.Character/toUpperCase (first string))
                (java.lang.Character/toLowerCase (first string))]
          tail (generate-cases (rest string))]
      (cons head tail))))

(defn range-contains-insensitive 
  [first last string]
  (let [f (fn [acc candidate] (or acc (range-contains first last (apply str candidate))))]
    (reduce f false (generate-cases string))))

(fact "Range overlapping case insensitive"
  (range-contains-insensitive "A" "Z" "g") => true
  (range-contains-insensitive "FA" "Fa" "foo") => true
  (range-contains-insensitive "b" "z" "a") => false
  (range-contains-insensitive "B" "z" "a") => true)
4

2 回答 2

1

我认为不是创建所有大小写组合,而是可以通过分别检查每个字符的大写然后小写来解决,这会将 2^N 更改为 2N。

思路如下:

  • 保留 "lowdone" 和 "highdone" 标志,这表明 s 是否可以肯定地出现在下限之后,同时仍然可能出现在上限之前,反之亦然
  • 逐个字符地遍历字符串
  • 检查当前字母的大写版本是否可以出现在相应的下限字母之后,同时出现在上限字母之前,然后检查该字母的小写版本,如果两个字母都不满足两个条件,则返回 false (如果“lowdone”为真,则不检查下限,如果“highdone”为真,则不检查上限 - 在比较 ABC 和 ACA 时,一旦我们超过第二个字母,我们就不关心第三个字母)
  • 如果一个案例同时满足这两个条件,检查它是否严格在下限字母之后或者下限太短而没有对应的字母,如果是,lowdone = true
  • 类似于 highdone = true

这听起来不错?C# 中的代码(可能写得更简洁):

        public Bracket(string l, string u)
        {
            Low = l;
            High = u;
        }

        public bool IsMatch(string s)
        {
            string su = s.ToUpper();
            string sl = s.ToLower();

            bool lowdone = false;
            bool highdone = false;
            for (int i = 0; i < s.Length; i++)
            {
                char[] c = new char[]{su[i], sl[i]};

                bool possible = false;
                bool ld = lowdone;
                bool hd = highdone;
                for (int j = 0; j < 2; j++)
                {
                    if ((lowdone || i >= Low.Length || c[j] >= Low[i]) && (highdone || i >= High.Length || c[j] <= High[i]))
                    {
                        if (i >= Low.Length || c[j] > Low[i])
                            ld = true;

                        if (i >= High.Length || c[j] < High[i])
                            hd = true;

                        possible = true;
                    }
                }
                lowdone = ld;
                highdone = hd;

                if (!possible)
                    return false;
            }

            if (!lowdone && Low.Length > s.Length)
                return false;

            return true;
        }
    }
于 2013-07-02T07:59:02.827 回答
0

本着完全公开的精神,我想我还应该添加我想出的算法(Java,使用番石榴):

public static boolean inRange(String search, String first, String last) {
    int len = search.length();
    if (len == 0) {
        return true;
    }

    char low = Strings.padEnd(first, len, (char) 0).charAt(0);
    char high = Strings.padEnd(last, len, (char) 0).charAt(0);

    char capital = Character.toLowerCase(search.charAt(0));
    char small = Character.toUpperCase(search.charAt(0));

    if (low == high) {
        if (capital == low || small == low) {
            // All letters equal - remove first letter and restart
            return inRange(search.substring(1), first.substring(1), last.substring(1));
        }
        return false;
    }

    if (containsAny(Ranges.open(low, high), capital, small)) {
        return true; // Definitely inside
    }

    if (!containsAny(Ranges.closed(low, high), capital, small)) {
        return false; // Definitely outside
    }

    // Edge case - we are on a bound and the bounds are different
    if (capital == low || small == low) {
        return Ranges.atLeast(first.substring(1)).contains(search.substring(1).toLowerCase());
    }
    else {
        return Ranges.lessThan(last.substring(1)).contains(search.substring(1).toUpperCase());
    }
}

private static <T extends Comparable<T>> boolean containsAny(Range<T> range, T value1, T value2) {
    return range.contains(value1) || range.contains(value2);
}
于 2013-07-02T09:04:48.970 回答