0

I have this program that needs to do some comparisons of strings in an array. The first thought one would have is of course to just use strcmp to check whether two strings in the array are the same. Now consider the option that you just need to compare the pointers to the strings. This would involve some preparations to map each element that are literally the same to the same place in memory.

I've done all this, by preparing with strcmp, and now with strstr (which I believe is faster). But because I need to check every string to map them to their first occurences, I get horribly long preparation-times. I should mention that this array is several MB large.

Here is an example of what I want to do:

[0x0: "I", 0x1: "am", 0x2: "done", 0x3: "here.", 0x4: "I", 0x5: "have", 0x6: "done", 0x7: "everything!"]

[0x10: 0x0, 0x11: 0x1, 0x12: 0x2, 0x13: 0x3, »0x14: 0x0«, 0x15: 0x5, »0x16: 0x2«, 0x17: 0x07]

So now to the question: Is there another way to do this kind of mapping faster than I am already doing?

4

1 回答 1

1

如果只是为了查看是否存在重复项......您可以qsort()在字符串数组上运行 a ,如果您的排序函数发现重复项,您可以提前保释。或者,如果您需要删除重复项,则让排序完成并从列表底部线性迭代并在找到它们时将它们拉出(因为所有重复项将彼此相邻)。

如果字符串相对不同,strcmp()则实际上只需要检查前几个字符,然后才能排除失败的匹配。所以它可能没有你想象的那么糟糕。

当然,这样做的难易程度取决于字符串是如何真正存储在内存中的。

更新:

好的,根据您的更新...马特对哈希表的建议可能效果最好:

  • 一个一个地遍历你的列表
  • 散列字符串
  • 检查表中是否已经存在
  • 如果没有,请将其添加到表中并继续
  • 如果是这样,请使用表中的现有索引
  • ...然后继续下一个。

我想这应该会给你相对不错的表现,总的来说。

于 2013-08-08T04:03:16.917 回答