0

我想加入我的数据框的所有文本列,以便我可以将其放入CountVectorizer.

def populate_distance_metrics(in_df, col_list, prim_col):

    vect_data=in_df[col_list[0]].map(str)
    print (type(vect_data))
    for col,idx in enumerate(col_list):
        if idx==0:
            continue;
        vect_data = vect_data + " " + in_df[col]

    cv = CountVectorizer(stop_words='english', max_features=1000)
    # Learn a vocabulary dictionary of all tokens 
    cv.fit(vect_data)
    print ('cv fit')

in_df是源数据框,col_list是一个数组,例如['a','b','c',...],我想保持这种灵活性。的类型

vect_data=in_df[col_list[0]].map(str)

<class 'pandas.core.series.Series'>

上面的代码失败了vect_data = vect_data + " " + in_df[col]

vect_data = vect_data + " " + in_df[col]
  File "asd/asd/dsfg/lib/python3.6/site-packages/pandas/core/frame.py", line 2059, in __getitem__
    return self._getitem_column(key)
  File "asd/asd/dsfg/lib/python3.6/site-packages/pandas/core/frame.py", line 2066, in _getitem_column
    return self._get_item_cache(key)
  File "asd/asd/dsfg/lib/python3.6/site-packages/pandas/core/generic.py", line 1386, in _get_item_cache
    values = self._data.get(item)
  File "asd/asd/dsfg/lib/python3.6/site-packages/pandas/core/internals.py", line 3543, in get
    loc = self.items.get_loc(item)
  File "asd/asd/dsfg/lib/python3.6/site-packages/pandas/indexes/base.py", line 2136, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)

  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)

  File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)

  File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)

KeyError: 0

但是,当我这样做时它会起作用

cv.fit(in_df['a']+ ' '+ in_df['b']+ in_df['c'])

我究竟做错了什么?

4

0 回答 0