0

我不知道如何清理和矢量化数据。

train=pd.read_csv('longilati.csv',encoding='mac_roman')
train`
Index(['Comment ', 'Polarity'], dtype='object')

以下数据在我的数据框中:

在此处输入图像描述

但是,每当我尝试使用以下代码清理数据时

def remove_pattern(text,pattern):     
    r = re.findall(pattern,text)    
        for i in r:
        text = re.sub(i,"",text)
        return text
train['Tidy'] = np.vectorize(remove_pattern)(train['Comment'],"@[\w]*")
train

我收到此错误KeyError: 'Comment' 这是它的完整堆栈跟踪

KeyError                                  Traceback (most recent call last)
F:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, 
key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

F:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, 
key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return 
self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, 
tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in 
pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in 
pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Comment'
4

1 回答 1

1

comment您的列名后面有空格。使用替换列名

df.rename({'Comment ':'Comment'}, axis=1, inplace=True)

或使用

np.vectorize(remove_pattern)(train['Comment '],"@[\w]*")
于 2020-12-24T17:15:35.343 回答