python - 在 pandas 列中搜索字符串

Question

我试图在下面的 hard_skills_name 列中找到一个子字符串，就像我想要所有具有“Apple Products”作为硬技能的行一样。

我尝试了以下代码：

df.loc[df['hard_skills_name'].str.contains("Apple Products", case=False)]

但收到此错误：

KeyError                                  Traceback (most recent call last)
<ipython-input-49-acdcdfbdfd3d> in <module>
----> 1 df.loc[df['hard_skills_name'].str.contains("Apple Products", case=False)]

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in __getitem__(self, key)
    877 
    878             maybe_callable = com.apply_if_callable(key, self.obj)
--> 879             return self._getitem_axis(maybe_callable, axis=axis)
    880 
    881     def _is_scalar_access(self, key: Tuple):

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1097                     raise ValueError("Cannot index with multidimensional key")
   1098 
-> 1099                 return self._getitem_iterable(key, axis=axis)
   1100 
   1101             # nested tuple slicing

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1035 
   1036         # A collection of keys
-> 1037         keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
   1038         return self.obj._reindex_with_indexers(
   1039             {axis: [keyarr, indexer]}, copy=True, allow_dups=True

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1252             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1253 
-> 1254         self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
   1255         return keyarr, indexer
   1256 

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1296             if missing == len(indexer):
   1297                 axis_name = self.obj._get_axis_name(axis)
-> 1298                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   1299 
   1300             # We (temporarily) allow for some missing keys with .loc, except in

KeyError: "None of [Float64Index([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan],\n             dtype='float64')] are in the [index]"

score 2 · Accepted Answer

str.join()尝试在字符串搜索之前将字符串列表链接（临时）转换为逗号分隔的字符串：

df[df['hard_skills_name'].str.join(', ').str.contains("Apple Products", case=False)]

问题是由于您要搜索的字符串包含在列表中。您不能直接使用搜索列表中的字符串.str.contains()。为了解决这个问题，您可以先将字符串列表转换为长字符串（例如，用逗号分隔子字符串），.str.join()然后再进行字符串搜索。

score 1 · Accepted Answer

Your index has null values. You're going to have to make a boolean mask for this. Directly answering your question:

df.loc[(df.index.notnull()) & (df['hard_skills_name'].str.contains("Apple Products", case=False))]

This should exclude anything that has null index values and does contain the given string in hard_skills_name

However, I suspect that this will also exclude some data that you're looking for. The solution in that case would be to change your index to not have any NaNs. Whether that means replacing it with a placeholder value or creating a brand new index, that's up to you.

python - 在 pandas 列中搜索字符串

2 回答 2

Related

Reference