Right now I have the following code that gets some features and labels data from a csv file and uses them to create a DecisionTreeClassifier model and fit it.
import csv
from sklearn import tree
from sklearn.externals import joblib
mycsv = csv.reader(open('postsBase2.csv'))
features = []
labels = []
for row in mycsv:
features.append([row[2], row[3], row[6]])
labels.append(row[8])
clf = tree.DecisionTreeClassifier()
clf = clf.fit(features, labels)
I actually have a few other fields in the csv I would like to load that are categorical data. They are in row indexes 7 and 8. The categorical data in row index 7 can be one of 4 categories and the categorical data in row index 8 can be one of 5 categories.
I want to add these to my features and then pass them into the OneHotEncoding class somehow to turn them into categorical data the model can be fitted with: The update code with some psuedocode for what I want to do is below:
import csv
from sklearn import tree
from sklearn.externals import joblib
mycsv = csv.reader(open('postsBase2.csv'))
features = []
labels = []
for row in mycsv:
features.append([row[2], row[3], row[6], row[7], row[8]])
labels.append(row[8])
//Here I now want to process the features from row index 7 and 8 via OneHotEncoding somehow to make them acceptable for the DecisionTreeClassifier
clf = tree.DecisionTreeClassifier()
clf = clf.fit(features, labels)
How can I do this?