我有以下辅助功能:
flatten = lambda sentences : [word for sentence in sentences for word in sentence] # flattens list of lists
get_feature = lambda f, input: np.array(list(map(f, input))) # applies f() to each element of input list and returns list of resultant elements
is_numeric = lambda words: get_feature(lambda word: word.isnumeric(), words)
然后执行以下操作:
ls = ['a','1','b','2','c','d','e']
print(type(ls))
print(list(filter(is_numeric, ls)))
print(type(train_tokens))
print(train_tokens[:10])
print(list(filter(is_numeric, train_tokens))[:10])
给出以下输出:
<class 'list'>
['1', '2']
<class 'list'>
['EU', 'rejects', 'German', 'call', 'to', 'boycott', 'British', 'lamb', '.', 'Peter']
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-5-59836dbfb218> in <module>()
11 print(type(train_tokens))
12 print(train_tokens[:10])
---> 13 print(list(filter(is_numeric, train_tokens))[:10])
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
ls和都是train_tokens列表。然后我不明白为什么filter适用于ls但不适用于train_tokens.
我在做什么愚蠢的错误?
PS:这就是我的形成方式train_tokens:
!pip install datasets
from datasets import load_dataset
conll2003dataset = load_dataset("conll2003")
train_tokens = flatten(conll2003dataset['train']['tokens'])
这是笔记本的链接。