python - 为什么`filter`适用于一个列表而不适用于其他给出“具有多个元素的数组的真值是不明确的。”？

Question

我有以下辅助功能：

flatten = lambda sentences : [word for sentence in sentences for word in sentence] # flattens list of lists 
get_feature = lambda f, input: np.array(list(map(f, input))) # applies f() to each element of input list and returns list of resultant elements
is_numeric = lambda words: get_feature(lambda word: word.isnumeric(), words)

然后执行以下操作：

ls = ['a','1','b','2','c','d','e']
print(type(ls))
print(list(filter(is_numeric, ls)))
print(type(train_tokens))
print(train_tokens[:10])
print(list(filter(is_numeric, train_tokens))[:10])

给出以下输出：

<class 'list'>
['1', '2']
<class 'list'>
['EU', 'rejects', 'German', 'call', 'to', 'boycott', 'British', 'lamb', '.', 'Peter']
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-59836dbfb218> in <module>()
     11 print(type(train_tokens))
     12 print(train_tokens[:10])
---> 13 print(list(filter(is_numeric, train_tokens))[:10])

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

ls和都是train_tokens列表。然后我不明白为什么filter适用于ls但不适用于train_tokens.

我在做什么愚蠢的错误？

PS：这就是我的形成方式train_tokens：

!pip install datasets
from datasets import load_dataset
conll2003dataset = load_dataset("conll2003")

train_tokens = flatten(conll2003dataset['train']['tokens'])

这是笔记本的链接。

score 0 · Accepted Answer

您使用的功能filter没有达到预期的效果。filter一次将函数应用于其输入可迭代的每个元素。使用您的函数，这意味着您正在检查单词的每个字符以查看它是否为数字，并返回一个booleannumpy 数组。该数组filter仅在长度为 1 时才有效。也就是说，如果单词中只有一个字母。对于更长的单词，您将获得更长的数组，并且正如错误消息所述，您无法在布尔上下文中评估长度大于 1 的 numpy 数组。

如果您真的想使用filter，您实际上不需要编写自己的函数（当然不会像您所展示的那样复杂）。只需作为函数传递str.isnumeric，它将一次性检查整个单词（而不是逐个字母）。

print(list(filter(str.isnumeric, train_tokens)))

另一方面，如果你真的想使用numpy你自己的数字检查代码，你可以放弃并在整个列表filter上调用你的is_numericlambda 函数来获得一个可以用作掩码的布尔 numpy 数组。如果您也将输入转换为数组，则可以使用掩码对其进行索引以仅获取数字条目：

numeric_mask = is_numeric(train_tokens)   # this is calling your function is_numeric
train_tokens_array = np.asarray(train_tokens)
print(train_tokens_array[numeric_mask][:10])

python - 为什么`filter`适用于一个列表而不适用于其他给出“具有多个元素的数组的真值是不明确的。”？

1 回答 1

Related

Reference