我有一个带有特征(人名)和标签(人的种族)的 csv 文件。我可以使用 Python Pandas 设置数据框,但是当我尝试将其与 NLTK 模块链接以运行朴素贝叶斯时,我收到以下错误:
Traceback (most recent call last):
File "C:\Users\Desktop\file.py", line 19, in <module>
classifier = nbc.train(train_set)
File "E:\Program Files Extra\Python27\lib\site-packages\nltk\classify\naivebayes.py", line 194, in train
for fname, fval in featureset.items():
AttributeError: 'str' object has no attribute 'items'
这是我的代码:
import pandas as pd
from pandas import DataFrame
import re
import numpy as np
import nltk
from nltk.classify import NaiveBayesClassifier as nbc
data = pd.read_csv("C:\Users\KubiK\Desktop\OddNames_sampleData3.csv")
frame = DataFrame(data)
frame.columns = ["feature", "label"]
feature = frame.feature
label = frame.label
# Extract features.
featuresets = [(feature, label) for index, (feature, label) in frame.iterrows()]
# Split train and test set
train_set, test_set = featuresets[:400], featuresets[400:]
# Train a classifier
classifier = nbc.train(train_set)
# Test classifier on "Neo"
print classifier.classify(ethnic_features('Silva'))
样本数据:
Name Ethnicity
J-b'te Letourneau Scotish
Jane Mc-earthar French
Li Chen Chinese
Amabil?? Bonneau English
Emma Lef??c French
C., Akeefe African
D, James Matheson English