I have around 80,000 text files and I want to be able to do an advanced search on them. Let's say I have two lists of keywords and I want to return all the files that include at least one of the keywords in the first list and at least one in the second list. Is there already a library that would do that, I don't want to rewrite it if it exists.
问问题
4740 次
2 回答
4
As you need to search the documents multiple times, you most likely want to index the text files to makes such searches as fast as possible.
Implementing a reasonable index yourself is certainly possible, but a quick search lead me to:
Take a look at the documentation. It should hopefully be rather trivial to achieve the desired behaviour.
于 2013-03-31T00:30:16.157 回答
0
I just get a feeling you want to use MapReduce type of processing for the search. It should be very scalable, Python should have MapReduce packages.
于 2013-04-01T16:22:11.913 回答