I am hoping someone can answer this for me as I am stuck.
What methodology does rapidminer use in it's correlation matrix? For all data combinations would be nice, but most importantly for nominal/categorical data sets?
I am using rapidminer to build a correlation matrix and have been careful to properly label all attributes as numbers, binominal, polynominal, etc. I am finding that my matrix shows negative correlations for some of the nominal/nominal combinations of attributes, which doesn't make since based on the methods that I would normally think would be chosen (Phi, Cramer's V, Contingency Coefficient) to calculate this. I thought the correlation had to be positive for these tests, and it doesn't make sense to have a "negative" correlation between categories like gender and city as that would suggest an order in the data.
Is there another test used, or dummy coding or something? And if dummy coding is used how reliable is the value obtained?
Thank you in advance to anyone who can help me. Hate to admit when I am lost, but here I am needing a map :)