I've been working on a project to data-mine a large amount of short texts and categorize these based on a pre-existing large list of category names. To do this I had to figure out how to first create a good text corpus from the data in order to have reference documents for the categorization and then to get the quality of the categorization up to an acceptable level. This part I am finished with (luckily categorizing text is something that a lot of people have done a lot of research into).
Now my next problem, I'm trying to figure out a good way of linking the various categories to each other computationally. That is to say, to figure out how to recognize that "cars" and "chevrolet" are related in some way. So far I've tried utilizing the N-Gram categorization methods described by, among others, Cavnar and Trenkle for comparing the various reference documents I've created for each category. Unfortunately it seems the best I've been able to get out of that method is approximately 50-55% correct relations between categories, and those are the best relations, overall it's around 30-35% which is miserably low.
I've tried a couple of other approaches as well but I've been unable to get much higher than 40% relevant links (an example of a non-relevant relation would be the category "trucks" being strongly related to the category "makeup" or the category "diapers" while weakly (or not at all) related to "chevy").
Now, I've tried looking for better methods for doing this but it just seems like I can't find any (yet I know others have done better than I have). Does anyone have any experience with this? Any tips on usable methods for creating relations between categories? Right now the methods I've tried either don't give enough relations at all or contain way too high a percentage of junk relations.