.net - 用于获取数组中字符串序列在另一个数组上的重叠的算法

Question

我想建立一个算法，它能够给出两个数组中元素的重叠索引。

例如我有两个字符串数组

Array1: {"a","c","h", "d","a","o","m" }

Array2: { "d","a","o","m" ,"c","e","o","m","c","z","e","l" ,"p","v","c","z","c"}

它应该以这种方式返回 array1 和 array2 的索引

x1,y1={3,6} *x2,y2={0,3}*

这意味着数组中字符串的序列应该匹配，只要字符串值匹配，我们就需要这样做，并确保每条记录在它们之前的记录匹配中都是唯一的。

等待答案和回复，如果有问题，请告诉我。

举个例子，我在数据库中有一个表，我在其中插入记录，并且我要插入的每条记录都应该是非常独特的。所以在插入时，我们确实有一个需要批量插入的记录数组。因此，如果我说，我确实有数据库表中的记录格式为：

列 1 哈希 1 哈希 2 哈希 3 哈希 4 哈希 5

我想在数据库中插入一堆记录，格式为： hash3 hash4 hash5 hash6 hash7 hash8

那么表格的结果应该看起来像这样 colum1 hash1 hash2 hash3 hash4 hash5 hash6 hash7 hash8

但是，如果需要插入数据库的数组的格式为 hash2 hash3 hash4 hash6

那么它应该被视为一个全新的条目，并将作为一个整体插入到数据库中。

希望这次我清楚阐述我的问题

score 2 · Accepted Answer

The problem you're referring to is called the Longest Common Substring problem (not to be confused with another problem about strings and an LCS acronym - longest common subsequence). As usual, the best explanation is in Wikipedia: http://en.wikipedia.org/wiki/Longest_common_substring_problem :)

In short - if the strings are really short and you're a very lazy programmer, the fastest way of doing that is to check all substrings of arr2 against arr1. This will take about n^2*m time (if arr2 if of length n, and arr1 is of length m) - which is a lot of time for long strings.

If your strings are longer / you are less lazy, the best algorithm using suffix trees will give you linear running time.

.net - 用于获取数组中字符串序列在另一个数组上的重叠的算法

1 回答 1

Related

Reference