arrays - 算法 - 检查字符串数组中的任何字符串是否是同一数组中任何其他字符串的前缀

Question

我想检查字符串数组中的任何字符串是否是同一数组中任何其他字符串的前缀。我在考虑基数排序，然后单次通过数组。

有人有更好的主意吗？

score 1 · Accepted Answer

To achieve a time complexity close to O(N²): compute hash values for each string.

Come up with a good hash function that looks something like:

A mapping from [a-z]->[1,26]
A modulo operation(use a large prime) to prevent overflow of integer

So something like "ab" gets computed as "12"=1*27+ 2=29

A point to note:

Be careful what base you compute the hash value on.For example if you take a base less than 27 you can have two strings giving the same hash value, and we don't want that.

Steps:

Compute hash value for each string
Compare hash values of current string with other strings:I'll let you figure out how you would do that comparison.Once two strings match, you are still not sure if it is really a prefix(due to the modulo operation that we did) so do a extra check to see if they are prefixes.
Report answer

score 1 · Accepted Answer

如果对它们进行排序，则只需要检查每个字符串是否是下一个字符串的前缀。

score 1 · Accepted Answer

我认为，可以修改基数排序以即时检索前缀。我们所要做的就是按行的第一个字母对行进行排序，将它们的副本存储在每个单元格中没有第一个字母。然后如果单元格包含空行，则此行对应一个前缀。如果单元格只包含一个条目，那么其中当然没有可能的行前缀。

在这里，这可能比我的英语更干净：

lines = [
"qwerty",
"qwe",
"asddsa",
"zxcvb",
"zxcvbn",
"zxcvbnm"
]

line_lines = [(line, line) for line in lines]

def find_sub(line_lines):
    cells = [ [] for i in range(26)]
    for (ine, line) in line_lines:
        if ine == "":
            print line
        else:
            index = ord(ine[0]) - ord('a')
            cells[index] += [( ine[1:], line )]
    for cell in cells:
        if len(cell) > 1:
            find_sub( cell )

find_sub(line_lines)

arrays - 算法 - 检查字符串数组中的任何字符串是否是同一数组中任何其他字符串的前缀

3 回答 3

Related

Reference