I would like to create a simple application in C# that takes in a group of words, then returns all groupings of those individual words from a data set.
For example, given car and bike, return a list of groups/combinations of words (with the number of combinations found) from a data set.
To further clarify - given a category named "car", I would like to see a list of word groupings with the word "car". This category could also be several words rather than just one.
With a sample data set of:
CAR:
- Another car for sale
- Blue car on the horizon
- For Sale - used car
- this car is painted blue
should return
car : for sale : 2
car : blue : 2
I'd like to set a threshold, say 20 or greater, so if there are over 20 instances of the word(s) with car, then display them - category
, words
, count
, where only category
is known; words
and count
is determined by the algorithm.
The data set is in a SQL Server 2008 table, and I was hoping to use something like a .Net implementation of R to accomplish this.
I am guessing that the best way to accomplish this may be with the R programming language, and am only now looking at R.Net.
I would prefer to do this with .Net, as that is what I am most familiar with, but open to suggestions.
Can someone with some experience with this lead me in the right direction?
Thanks.