c++ - 哪种数据结构更好地用于查找句子是否由唯一字符组成？

Question

我正在尝试解决一项任务，但不确定我是否使用了合适的数据结构。我的任务是查找句子是否由唯一字符组成并因此返回布尔值。

这是我的功能：

bool use_map(string sentence) {
    map<int, string> my_map;

    for (string::size_type i = 0; i <= sentence.length(); i++) {
        unsigned int index = (int)sentence[i];    
        if (my_map.find(index) != my_map.end())
            return false;       
        my_map[index] = sentence[i];
    }

    return true;    
}

我发现只有适合我的地图结构。也许我错过了什么？

也许在 ? 处使用动态数组之类的东西会更好PHP？

我正在尝试使用哈希表解决方案。

score 4 · Accepted Answer

一个非常简单（但相当耗费内存）的方法是：

bool use_map(const std::string& sentence)
{
    std::set<char> chars(sentence.begin(), sentence.end());
    return chars.size() == sentence.size();
}

如果没有重复的字符，则字符串和集合的大小将相等。

@Jonathan Leffler 在评论中提出了一个很好的观点：句子通常包含几个空格，所以这将返回false。您需要过滤掉空格。不过，std::set应该是您选择的容器。

编辑：

这是一个没有额外内存的 O(n) 解决方案的想法。只需使用查找表来标记之前是否看到过该字符：

bool no_duplicates(const std::string& sentence)
{
    static bool table[256];
    std::fill(table, table+256, 0);

    for (char c : sentence) {

        // don't test spaces
        if (c == ' ') continue;
        // add more tests if needed

        const unsigned char& uc = static_cast<unsigned char>(c);
        if (table[uc]) return false;
        table[uc] = true;
    }
    return true;
}

score 4 · Accepted Answer

其他答案建议std::set，这是一个解决方案。但是，他们复制里面的所有字符std::set，然后得到set. 你真的不需要这个，你可以避免它，使用std::set::insert. 就像是：

std::set< char > my_set;
for (std::string::size_type ii = 0; ii < sentence.size(); ++ii) 
{
    if( ! my_set.insert( sentence[ ii ] ).second )
    {
        return false;
    }
}

这样，您将：

在第一个重复的字符上停止，您将不会复制整个字符串（不必要）
您将避免int代码中不必要的强制转换
将节省内存 - 如果你真的不需要你std::map< int, std::string >::second

另外，请确保您需要“计算”所有chars 或者您想跳过其中一些（如空格、逗号、问号等）

score 3 · Accepted Answer

我想一个简单的方法是将所有字符存储在一个不允许重复的关联容器中，例如std::set, 并检查它是否包含单个值：

#include <set>
#include <string>

bool has_unique_character(std::string const& str)
{
    std::set<char> s(begin(str), end(str));
    return (s.size() == str.size());
}

score 2 · Accepted Answer

那这个呢？当然还有案例问题...

bool use_map(const std::string& sentence)
{
    std::vector<bool> chars(26, false);
    for(std::string::const_iterator i = sentence.begin(); i != sentence.end(); ++i) {
        if(*i == ' ' || *i - 'a' > 25 || *i - 'a' < 0) {
            continue;
        } else if(chars[*i - 'a']) {
            return false;
        } else {
            chars[*i - 'a'] = true;
        }
    }

    return true;
}

score 1 · Accepted Answer

对字符进行排序，然后查找两个字符相等的相邻字母字符对。像这样的东西：

std::string my_sentence = /* whatever */
std::sort(my_sentence.begin(), my_sentence.end());
std::string::const_iterator it =
    std::adjacent_find(my_sentence.begin(), my_sentence.end());
while (it != my_sentence.end() && isalpha((unsigned char)*it)
    it = std::adjacent_find(++it, my_sentence.end());
if (it == my_sentence.end())
    std::cout << "No duplicates.\n";
else
    std::cout << "Duplicated '" << *it << "'.\n";

score 0 · Accepted Answer

如果允许使用额外的内存，请使用哈希表：
遍历数组，检查当前元素是否已经被哈希。如果是，您发现了重复。如果不是，则将其添加到哈希中。这将是线性的，但需要额外的内存。

如果原始序列元素的范围非常小，那么您可以简单地拥有一个范围大小的数组，而不是散列，并在桶排序中做类似的事情。例如

bool hasDuplicate( string s )
{
   int n = s.size();
   vector<char> v( 256, 0 );
   for( int i = 0; i < n; ++i )
      if( v[ s[ i ] ] ) // v[ hash( s[i] ) ] here in case of hash usage
         return true;
      else
         v[ s[ i ] ] = 1; // and here too
   return false;
}

最后，如果您不允许使用额外的内存，您可以对其进行排序并检查两个相邻元素是否在一次传递中相等。这将花费O(nlogn)时间。不需要集合或地图:)

score 0 · Accepted Answer

这是最快的解决方案：

bool charUsed[256];
bool isUnique(string sentence) {
    int i;
    for(i = 0; i < 256; ++i) {
        charUsed[i] = false;
    }

    int n = s.size();
    for(i = 0; i < n; ++i) {
        if (charUsed[(unsigned char)sentence[i]]) {
            return false;
        }
        charUsed[(unsigned char)sentence[i]] = true;
    }
    return true;
}

c++ - 哪种数据结构更好地用于查找句子是否由唯一字符组成？

7 回答 7

Related

Reference