A straightforward way to do this would be to just do a linear scan through the text, always keeping track of the last N+1 words (or characters) you see, where N is the number of words (or characters) in the longest phrase contained in your words_in_front
collection. When you have a "main word", you can just check whether the sequence of N words/characters before it ends with any of the prefixes you have.
This would be a bit faster if you transformed your words_in_front
set into a nicer data structure, such as a hashmap (perhaps keyed by last letter in the phrase..) or a prefix/suffix tree of some sort, so you wouldn't have to do an .endsWith
over every single member of the set of prefixes each time you have a matching "main word." As was stated in another answer, there is much room for optimization and a few other possible implementations, but there's a start.