我正在处理一段代码,它本质上是试图将字符串列表递归地减少为单个字符串。
我有一个内部数据库,由不同长度的匹配字符串数组组成(比如数组长度为 2-4)。
一个示例输入字符串数组将是:
{"The", "dog", "ran", "away"}
再举个例子,我的数据库可以以这种方式由字符串数组组成:
(length 2) {{"The", "dog"},{"dog", "ran"}, {"ran", "away"}}
(length 3) {{"The", "dog", "ran"}.... and so on
所以,我试图做的是将我的输入字符串数组递归地减少为单个标记。所以理想情况下它会解析这样的东西:
1) {"The", "dog", "ran", "away"}
Say that (seq1) = {"The", "dog"} and (seq2) = {"ran", "away"}
2) { (seq1), "ran", "away"}
3) { (seq1), (seq2)}
In my sequence database I know that, for instance, seq3 = {(seq1), (seq2)}
4) { (seq3) }
所以,当它归结为一个令牌时,我很高兴并且该功能将结束。
这是我当前程序逻辑的概述:
public void Tokenize(Arraylist<T> string_array, int current_size)
{
// retrieve all known sequences of length [current_size] (from global list array)
loc_sequences_by_length = sequences_by_length[current_size-min_size]; // sequences of length 2 are stored in position 0 and so on
// escape cases
if (string_array.Count == 1)
{
// finished successfully
return;
}
else if (string_array.Count < current_size)
{
// checking sequences of greater length than input string, bail
return;
}
else
{
// split input string into chunks of size [current_size] and compare to local database
// of known sequences
// (splitting code works fine)
foreach (comparison)
{
if (match_found)
{
// update input string and recall function to find other matches
string_array[found_array_position] = new_sequence;
string_array.Removerange[found_array_position+1, new_sequence.Length-1];
Tokenize(string_array, current_size)
}
}
}
// ran through unsuccessfully, increment length and try again for new sequence group
current_size++;
if (current_size > MAX_SIZE)
return;
else
Tokenize(string_array, current_size);
}
我认为这很简单,但得到了一些奇怪的结果。一般来说,它似乎可以工作,但在进一步查看我的输出数据后,我发现了一些问题。主要是,它似乎可以工作到某个点……那时我的“curr_size”计数器重置为最小值。
所以它的大小是 2,然后是 3,然后是 4,然后重置为 2。我的假设是它会运行到我预定的最大大小,然后完全保释。
我试图尽可能地简化我的代码,所以在转录过程中可能存在一些简单的语法错误。如果有任何其他细节可以帮助眼尖的 SO 用户,请告诉我,我会编辑。
提前致谢