c++ - 从过去的工作面试中检查我的字谜代码

Question

前段时间有以下作为面试问题，并且在基本语法上窒息得很厉害，以至于我无法进步（一旦肾上腺素开始发挥作用，编码就会消失。）

给定一个字符串列表，返回一个字符串集合列表，这些字符串集合是输入集合的变位词。即 "dog","god","foo" 应该返回 {"dog","god"}。之后，我自己创建了代码作为健全性检查，现在它已经存在了一段时间。我欢迎对此提出意见，看看我是否遗漏了什么，或者我是否可以更有效地完成它。以此为契机提高自己并学习其他技巧：


void Anagram::doWork(list input, list> &output)
{
  typedef list < pair < string, string>> SortType;
  SortType sortedInput;

  // sort each string and pair it with the original
  for(list< string >::iterator i = input.begin(); i != input.end(); ++i)
  {
    string tempString(*i);
    std::sort(tempString.begin(), tempString.end());
    sortedInput.push_back(make_pair(*i, tempString));
  }

  // Now step through the new sorted list
  for(SortType::iterator i = sortedInput.begin(); i != sortedInput.end();)
  {
    set< string > newSet;

    // Assume (hope) we have a match and pre-add the first.
    newSet.insert(i->first);

    // Set the secondary iterator one past the outside to prevent
    // matching the original
    SortType::iterator j = i;
    ++j;

    while(j != sortedInput.end())
    {
      if(i->second == j->second)
      {
        // If the string matches, add it to the set and remove it
        // so that future searches need not worry about it
        newSet.insert(j->first);
        j = sortedInput.erase(j);
      }
      else
      {
        // else, next element
        ++j;
      }
    }

    // If size is bigger than our original push, we have a match 
    // - save it to the output
    if(newSet.size() > 1)
    {
       output.push_back(newSet);
    }

    // erase this element and update the iterator
    i = sortedInput.erase(i);
  }
}

在查看评论并了解更多信息后，这是第二次通过：


void doBetterWork(list input, list> &output)
{
  typedef std::multimap< string, string > SortedInputType;
  SortedInputType sortedInput;
  vector< string > sortedNames;

  for(vector< string >::iterator i = input.begin(); i != input.end(); ++i)
  {
    string tempString(*i);
    std::sort(tempString.begin(), tempString.end());
    sortedInput.insert(make_pair(tempString, *i));
    sortedNames.push_back(tempString);
  }

  for(list< string >::iterator i = sortedNames.begin(); i != sortedNames.end(); ++i)
  {
    pair< SortedInputType::iterator,SortedInputType::iterator > bounds;
    bounds = sortedInput.equal_range(*i);

    set< string > newSet;
    for(SortedInputType::iterator j = bounds.first; j != bounds.second; ++j)
    {
      newSet.insert(j->second);
    }

    if(newSet.size() > 1)
    {
      output.push_back(newSet);
    }

    sortedInput.erase(bounds.first, bounds.second);
  }
}

score 14 · Accepted Answer

对字谜进行分组的最佳方法是将字符串映射到某种直方图表示。

 FUNCTION histogram
 [input] -> [output]

 "dog"   -> (1xd, 1xg, 1xo)
 "god"   -> (1xd, 1xg, 1xo)
 "foo"   -> (1xf, 2xo)

基本上，通过对字符串进行线性扫描，您可以生成它包含的每个字母的数量的直方图表示。一个小而有限的字母表使这更容易（例如，使用A-Z，您只有一个由 26 个数字组成的数组，每个字母一个）。

现在，字谜只是具有相同直方图的单词。

然后，您可以拥有一个多图数据结构，将直方图映射到具有该直方图的单词列表。

MULTIMAP
[key]           => [set of values]

(1xd, 1xg, 1xo) => { "dog", "god" }
(1xf, 2xo)      => { "foo" }

规范形式技巧

除了处理直方图，您还可以处理字符串的“规范形式”。基本上，您为每个字符串定义其规范形式是什么，如果两个词具有相同的规范形式，则它们是字谜。

一种方便的规范形式是将字符串中的字母按排序顺序排列。

FUNCTION canonize
[input]  -> [output]

"dog"    -> "dgo"
"god"    -> "dgo"
"abracadabra" -> "aaaaabbcdrr"

请注意，这只是直方图方法之后的一步：您实际上是在进行计数排序以对字母进行排序。

这是针对您的问题的实际编程语言中最实用的解决方案。

复杂

产生一个单词的直方图/规范形式实际上是O(1)（有限的字母大小，有限的最大字长）。

具有良好的哈希实现，get并且put在多图上是O(1).

您甚至可以拥有多个多重映射，每个单词长度一个。

因此，如果有N单词，则将所有单词放入多重映射中O(N)；然后输出每个字谜组只是转储多图中的值。这也可以在O(N).

这肯定比检查每对单词是否是字谜（一种O(N^2)算法）要好。

score 1 · Accepted Answer

我只是将它视为一个免费功能get_anagrams，因为class Anagram它似乎没有做任何其他事情。

list和的选择set并不比其他任何东西都好，所以当我在做的时候，我会把它们都做vector。实际上，输出可以只是一个平面序列。所以叫它anagram_sort。我将把“list”和“set”的规范作为基本结构而不是文字类模板。“给定……返回”也可以粗略地解释为就地修改输入。调用者可以根据需要创建一个副本。

struct anagram_less { bool operator()( string const &l, string const &r ) {
    string sl( l ), sr( r );
    sort( sl.begin(), sl.end() );
    sort( sr.begin(), sr.end() );
    return sl < sr;
} };

void anagram_sort( vector< string > &v ) { // "the answer"
    sort( v.begin(), v.end(), anagram_less );
} // 10 lines

   // usage:

void print_anagrams( vector< string > &&v ) { // destructive => use rvalue ref
    anagram_sort( v );

    for ( vector< string >::iterator i = v.begin(); i != v.end(); /*++i*/ ) {

        vector< string >::iterator e // find end of run of anagrams
            = adjacent_find( i, v.end(), anagram_less() );
        if ( e != v.end() ) ++ e; // usually need this after adjacent_find :v(

        cout << "( ";
        for ( ; i != e; ++ i ) {
            cout << * i << " ";
        }
        cout << ")" << endl;
    }
}

这是次优的，因为它重复地对单词进行排序。不过，我宁愿在面试问题中减少脂肪，也不愿在“纸上”做出快速的东西。

为了挤出最后一点性能，我可能会制作一个附加了序列号索引的输入副本，对字符串的字符进行排序，对字符串进行排序，然后使用索引使用该算法对输入向量进行重新排序. 最终运行时应该是低系数 O(n log n)。

score 1 · Accepted Answer

检查两个字符串是否是彼此的字谜的算法如下

以仅包含字母的方式转换两个字符串（因为婆婆和女性希特勒也是字谜）。还使两个字符串的大小写相等意味着两个字符串都是大写或小写。
现在对两个字符串中的字符进行排序。
比较两个字符串是否相等意味着它们是彼此的字谜

这是这个算法的代码

bool checkanagrams(string first,string second)
{

    string tempfirst,tempsecond;    
    int countfirst  = 0;  
    int countsecond = 0;        
   // only storing characters removing other junk things  like -
   for(int i=0;i<first.length();i++)
   {
      if(isalpha(first[i])
      { 
        tempfirst[countfirst] =first[i];
    countfirst++; 
      }

   }
   for(int i=0;i<second.length();i++)
   {
      if(isalpha(second[i])
      { 
        tempsecond[countsecond] =second[i];
    countsecond++; 
      }        

   } 
   sort(tempfirst.begin(),tempfirst.end());  
   sort(tempsecond.begin(),tempsecond.end());
   if(!strcmp(tempfirst.c_str(),tempsecond.c_str())
    return true;
   else
    return false;     

}

score 0 · Accepted Answer

我会把所有东西都放在一个哈希表中。然后对于每个字符串，我将其还原并检查它是否存在于哈希表中，如果存在，则在集合中返回反转的字符串和自身。

这种方法还可以在集合中找到回文。

c++ - 从过去的工作面试中检查我的字谜代码

4 回答 4

规范形式技巧

复杂

Related

Reference