string - 字符串处理算法

Question

我正在寻找一种用于字符串处理的算法，我已经搜索过但找不到满足我要求的算法。我将用一个例子来解释算法应该做什么。

有两组词集定义如下：

**Main_Words**: swimming, driving, playing
**Words_in_front**: I am, I enjoy, I love, I am going to go

一旦找到在Main_Words中定义的单词，该程序将搜索大量单词，它将检查该单词前面的单词以查看它是否具有Words_in_front中定义的任何匹配单词。

即如果程序遇到“游泳”这个词，它必须检查“游泳”这个词前面的词是否是以下之一：我是，我喜欢，我爱，我要去。

有没有可以做到这一点的算法？

score 1 · Accepted Answer

使用Main_Words和Words_in_front中的键创建一个映射/字典/散列/关联数组（无论您的语言中定义什么），它们是附加到该键指向的条目的链表。每当你遇到一个匹配某个键的单词时，就去表中看看附表中是否有与你前面的单词匹配的单词。

这是基本思想，它可以针对速度和空间进行优化。

score 1 · Accepted Answer

您应该能够按照以下方式构建正则表达式：

I (am|enjoy|love|am going to go) (swimming|driving|playing)

score 1 · Accepted Answer

A straightforward way to do this would be to just do a linear scan through the text, always keeping track of the last N+1 words (or characters) you see, where N is the number of words (or characters) in the longest phrase contained in your words_in_front collection. When you have a "main word", you can just check whether the sequence of N words/characters before it ends with any of the prefixes you have.

This would be a bit faster if you transformed your words_in_front set into a nicer data structure, such as a hashmap (perhaps keyed by last letter in the phrase..) or a prefix/suffix tree of some sort, so you wouldn't have to do an .endsWith over every single member of the set of prefixes each time you have a matching "main word." As was stated in another answer, there is much room for optimization and a few other possible implementations, but there's a start.

string - 字符串处理算法

3 回答 3

Related

Reference