I am looking for an algorithm or method that would help identify general phrases from a corpus of text that has a particular dialect (it is from a specific domain but for my case is a dialect of the English language) -- for example the following fragment could be from a larger corpus related to the World or Warcraft or perhaps MMORPHs.
players control a character avatar within a game world in third person or first person view, exploring the landscape, fighting various monsters, completing quests, and interacting with non-player characters (NPCs) or other players. Also similar to other MMORPGs, World of Warcraft requires the player to pay for a subscription, either by buying prepaid game cards for a selected amount of playing time, or by using a credit or debit card to pay on a regular basis
As output from the above I would like to identify the following general phrases:
- first person
- World of Warcraft
- prepaid game cards
- debit card
Notes:
There is a previous questions similar to mine here and here but for clarification mine has the following differences:
a. I am trying to use an existing toolkit such as NLTK, OpenNLP, etc.
b. I am not interested in identifying other Parts of Speech in the sentence
c. I can use human intervention where the algorithm presents the identified noun phrases to a human expert and the human expert can then confirm or reject the findings however we do not have resources for training a model of language on hand-annotated data