我正在运行一个句子转换器模型并试图截断我的标记,但它似乎没有工作。我的代码是
from transformers import AutoModel, AutoTokenizer
model_name = "sentence-transformers/paraphrase-MiniLM-L6-v2"
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
text_tokens = tokenizer(text, padding=True, truncation=True, return_tensors="pt")
text_embedding = model(**text_tokens)["pooler_output"]
我不断收到以下警告:
Token indices sequence length is longer than the specified maximum sequence length
for this model (909 > 512). Running this sequence through the model will result in
indexing errors
我想知道为什么设置truncation=True
不会将我的文本截断为所需的长度?