我有以下代码:
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
tokenizer = AutoTokenizer.from_pretrained("sagorsarker/codeswitch-spaeng-lid-lince")
model = AutoModelForTokenClassification.from_pretrained("sagorsarker/codeswitch-spaeng-lid-lince")
pipeline = pipeline('ner', model=model, tokenizer=tokenizer)
sentence = "some example sentence here"
results = pipeline(sentence)
这很好用。但不是 a str
,我不想传递 a list
of 令牌。我怎么做?
我想这样做的原因是,我的句子已经被标记化并且简单" ".join()
并不能正确地重现句子。例如,isn't
已被标记为is
和n't
。但是一个简单的" ".join()
会产生is n't