0

我正在寻找一种用于字符串处理的算法,我已经搜索过但找不到满足我要求的算法。我将用一个例子来解释算法应该做什么。

有两组词集定义如下:

**Main_Words**: swimming, driving, playing
**Words_in_front**: I am, I enjoy, I love, I am going to go

一旦找到在Main_Words中定义的单词,该程序将搜索大量单词,它将检查该单词前面的单词以查看它是否具有Words_in_front中定义的任何匹配单词。

即如果程序遇到“游泳”这个词,它必须检查“游泳”这个词前面的词是否是以下之一:我是,我喜欢,我爱,我要去。

有没有可以做到这一点的算法?

4

3 回答 3

1

使用Main_WordsWords_in_front中的键创建一个映射/字典/散列/关联数组(无论您的语言中定义什么),它们是附加到该键指向的条目的链表。每当你遇到一个匹配某个键的单词时,就去表中看看附表中是否有与你前面的单词匹配的单词。

这是基本思想,它可以针对速度和空间进行优化。

于 2013-03-30T09:36:33.587 回答
1

您应该能够按照以下方式构建正则表达式

I (am|enjoy|love|am going to go) (swimming|driving|playing)
于 2013-03-30T10:33:12.667 回答
1

A straightforward way to do this would be to just do a linear scan through the text, always keeping track of the last N+1 words (or characters) you see, where N is the number of words (or characters) in the longest phrase contained in your words_in_front collection. When you have a "main word", you can just check whether the sequence of N words/characters before it ends with any of the prefixes you have.

This would be a bit faster if you transformed your words_in_front set into a nicer data structure, such as a hashmap (perhaps keyed by last letter in the phrase..) or a prefix/suffix tree of some sort, so you wouldn't have to do an .endsWith over every single member of the set of prefixes each time you have a matching "main word." As was stated in another answer, there is much room for optimization and a few other possible implementations, but there's a start.

于 2013-03-30T15:27:52.473 回答