假设您有数千个按以下方式组织的文件:首先按文件名对它们进行排序(区分大小写,以便大写文件排在小写之前),然后将它们分组到包含第一个名称和该文件夹中的最后一个文件。例如,文件夹可能如下所示:
Abel -> Cain
Camel -> Sloth
Stork -> basket
basking -> sleuth
tiger -> zebra
现在,给定一个不区分大小写的搜索字符串s
,确定哪些文件夹可以包含匹配的文件s
。您不能也不必查看文件夹内部 - 该文件实际上不必存在。
一些例子:
("Abel", "Cain") matches s = "blue", since it contains "Blue"
("Stork", "basket") matches s = "arctic", since it contains "arctic"
("FA", "Fb") matches s = "foo", since it contains "FOo"
("Fa", "Fb") does NOT match s = "foo"
正式地:给定一个封闭的范围[a,b]
和一个小写的字符串s
,确定是否有任何字符串c
在[a,b]
这样的lower(c) = s
。
我的第一个预感是对范围的边界进行不区分大小写的搜索。但是从上一个例子中很容易看出这是不正确的。
强力解决方案是生成所有潜在的文件名。例如,输入字符串"abc"
将产生候选者"ABC", "ABc", "AbC", "Abc", "aBC", "aBc", "abC", "abc"
。然后,您只需针对界限进行测试。下面是这个蛮力解决方案的一个例子。这是O(2^n)
虽然。
我的问题是,是否有一种既快速又正确的算法?
Clojure 中的蛮力解决方案:
(defn range-contains
[first last string]
(and (<= (compare first string) 0)
(>= (compare last string) 0)))
(defn generate-cases
"Generates all lowercase/uppercase combinations of a word"
[string]
(if (empty? string)
[nil]
(for [head [(java.lang.Character/toUpperCase (first string))
(java.lang.Character/toLowerCase (first string))]
tail (generate-cases (rest string))]
(cons head tail))))
(defn range-contains-insensitive
[first last string]
(let [f (fn [acc candidate] (or acc (range-contains first last (apply str candidate))))]
(reduce f false (generate-cases string))))
(fact "Range overlapping case insensitive"
(range-contains-insensitive "A" "Z" "g") => true
(range-contains-insensitive "FA" "Fa" "foo") => true
(range-contains-insensitive "b" "z" "a") => false
(range-contains-insensitive "B" "z" "a") => true)