c++ - 给定一个单词和一个文本，返回文本中单词的字谜出现次数

Question

例如。word isfor和 text is , will be , ,等的forxxorfxdofr字谜。所以答案就是这个特定的例子。forofrorffro3

这是我想出的。

#include<iostream>
#include<cstring>

using namespace std;

int countAnagram (char *pattern, char *text)
{
    int patternLength = strlen(pattern);
    int textLength = strlen(text);

    int dp1[256] = {0}, dp2[256] = {0}, i, j;

    for (i = 0; i < patternLength; i++)
    {
        dp1[pattern[i]]++;
        dp2[text[i]]++;
    }

    int found = 0, temp = 0;

    for (i = 0; i < 256; i++)
    {
        if (dp1[i]!=dp2[i])
        {
            temp = 1;
            break;
        }
    }

    if (temp == 0)
        found++;


    for (i = 0; i < textLength - patternLength; i++)
    {
        temp = 0;
        dp2[text[i]]--;
        dp2[text[i+patternLength]]++;
        for (j = 0; j < 256; j++)
        {
            if (dp1[j]!=dp2[j])
            {
                temp = 1;
                break;
            }
        }
        if (temp == 0)
            found++;
    }
    return found;
}


int main()
{
    char pattern[] = "for";
    char text[] = "ofrghofrof";

    cout << countAnagram(pattern, text);

}

是否存在针对上述问题的更快算法？

score 0 · Accepted Answer

大部分时间都会花在搜索上，因此为了使算法更高效，目标是减少搜索量或优化搜索。

方法一：搜索起始位置表。

创建一个列表向量，为字母表中的每个字母创建一个向量槽。这可以在以后进行空间优化。

每个插槽将包含文本的索引列表。

示例文本：forxxorfxdofr

Slot  List  
'f'    0 --> 7 --> 11  
'o'    1 --> 5 --> 10  
'r'    2 --> 6 --> 12

对于每个单词，查找向量中的字母以获取文本的索引列表。对于列表中的每个索引，将列表项中的文本字符串位置与单词进行比较。

因此，对于上表和单词“ofr”，第一次比较发生在索引 1，第二次比较发生在索引 5，最后一次比较发生在索引 10。

您可以消除文本索引的近端（索引 + 字长 > 文本长度）。

score 0 · Accepted Answer

您可以使用乘法的交换性以及原始分解的唯一性。这取决于我之前的回答

创建从每个字符到素数列表的映射（尽可能小）。例如a-->2、b-->3、c-->5 等。这可以保存在一个简单的数组中。

现在，将给定的单词转换为匹配其每个字符的素数的乘积。该结果将等于该单词的任何字谜的类似乘法。

现在扫描数组，在任何给定的步骤中，保持与最后 L 个字符匹配的素数相乘（其中 L 是单词的长度）。所以每次你前进时，你都会这样做

mul = mul * char2prime(text[i]) / char2prime(text[i-L])

每当这个乘法等于你的话 - 增加总计数器，你就完成了

请注意，此方法适用于短词，但素数乘法会很快溢出 64b var（约 9-10 个字母），因此您必须使用大量数学库来支持更长的词。

score -1 · Accepted Answer

如果要转换的模式非常短，以至于搜索它的最佳方法是简单地扫描它，则该算法相当有效。为了允许更长的模式，这里由 ' for jj' 和 ' for mm' 循环表示的扫描可以用更复杂的搜索技术代替。

// sLine -- string to be searched
// sWord -- pattern to be anagrammed
// (in this pseudo-language, the index of the first character in a string is 0)
// iAnagrams -- count of anagrams found

iLineLim = length(sLine)-1
iWordLim = length(sWord)-1

// we need a 'deleted' marker char that will never appear in the input strings
chNil = chr(0)

iAnagrams = 0 // well we haven't found any yet have we
// examine every posn in sLine where an anagram could possibly start
for ii from 0 to iLineLim-iWordLim do {
  chK = sLine[ii]
  // does the char at this position in sLine also appear in sWord
  for jj from 0 to iWordLim do {
    if sWord[jj]=chK then { 
      // yes -- we have a candidate starting posn in sLine

      // is there an anagram of sWord at this position in sLine
      sCopy = sWord // make a temp copy that we will delete one char at a time
      sCopy[jj] = chNil // delete the char we already found in sLine
      // the rest of the anagram would have to be in the next iWordLim positions
      for kk from ii+1 to ii+iWordLim do {
        chK = sLine[kk]
        cc = false
        for mm from 0 to iWordLim do { // look for anagram char
          if sCopy[mm]=chK then { // found one
            cc = true
            sCopy[mm] = chNil // delete it from copy
            break // out of 'for mm'
          }
        }
        if not cc then break // out of 'for kk' -- no anagram char here
      }
      if cc then { iAnagrams = iAnagrams+1 }

      break // out of 'for jj'
    }
  }
}

-阿尔。

c++ - 给定一个单词和一个文本，返回文本中单词的字谜出现次数

3 回答 3

方法一：搜索起始位置表。

Related

Reference