python - 两个字符串列表的交集

Question

我有一个面试问题大致如下：

给定两个无序客户列表，返回两个列表的交集。也就是说，返回出现在两个列表中的客户列表。

我建立的一些东西：

假设每个客户都有一个唯一的名字
如果两个列表中的名称相同，则为同一客户
姓名的格式为名字姓氏
没有 II、Jr、怪异角色等的诡计。

我认为关键是要找到一种有效的算法/数据结构的使用来尽可能有效地做到这一点。

我的进度是这样的：

将一个列表读入内存，然后一次读取另一个列表一项，看看是否有匹配项
按字母顺序排列两个列表，然后从一个列表的顶部开始，查看每个项目是否出现在另一个列表中
将两个列表放入有序列表，然后使用较短的列表逐项检查（这样，一个列表有 2 个项目，您只检查这 2 个项目）
将一个列表放入哈希中，并检查另一个列表中是否存在键

面试官一直在问，“下一步是什么？”，所以我想我错过了其他东西。

任何其他技巧可以有效地做到这一点？

旁注，这个问题是在 python 中的，我刚刚读过 about sets，它似乎尽可能有效地做到这一点。知道数据结构/算法sets是什么吗？

score 5 · Accepted Answer

It really doesnt matter how its implemented ... but I believe it is implemented in C so it is faster and better set([1,2,3,4,5,6]).intersection([1,2,5,9]) is likely what they wanted

In python readability counts for alot! and set operations in python are used extensively and well vetted...

that said another pythonic way of doing it would be

list_new = [itm for itm in listA if itm in listB]

or

list_new = filter(lambda itm:itm in listB,listA)

basically I believe they were testing if you were familliar with python, not if you could implement the algorithm. since they asked a question that is so well suited to python

score 1 · Accepted Answer

将一个列表放入布隆过滤器并使用它来过滤第二个列表。
将过滤后的第二个列表放入布隆过滤器并使用它来过滤第一个列表。
对两个列表进行排序，并通过上述方法之一找到交集。

这种方法的好处（除了让你在面试中正确使用半模糊的数据结构）是它不需要任何 O(n) 存储，直到你（很有可能）减少了问题的大小。

面试官一直在问，“下一步是什么？”，所以我想我错过了其他东西。

也许他们会一直问，直到你没有答案。

http://code.google.com/p/python-bloom-filter/是布隆过滤器的python 实现。

python - 两个字符串列表的交集

2 回答 2

Related

Reference