0

我有一种情况,我想将翻译模型应用于数据框列中的每一行。

我正在使用的翻译代码:

from transformers import FSMTForConditionalGeneration, FSMTTokenizer
mname = "allenai/wmt19-de-en-6-6-big"
tokenizer = FSMTTokenizer.from_pretrained(mname)
model = FSMTForConditionalGeneration.from_pretrained(mname)
#Loop here for all rows in the German_Text column

input_ids = tokenizer.encode(input, return_tensors="pt")
outputs = model.generate(input_ids)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded)

我想将此模型应用于以下列并在此之后创建一个新的翻译列:

German_Text                     English_Text
Wie geht es dir heute
mir geht es gut

英文文本列将包含来自上述模型的翻译文本,因此我想将该模型应用于 German_text 列中的每一行,以在 English_Text 列中创建相应的翻译

4

1 回答 1

0

您需要做的就是将这些步骤包装到一个函数中并使用数据框的apply函数:

import pandas as pd
from transformers import FSMTForConditionalGeneration, FSMTTokenizer

mname = "allenai/wmt19-de-en-6-6-big"
tokenizer = FSMTTokenizer.from_pretrained(mname)
model = FSMTForConditionalGeneration.from_pretrained(mname)

df = pd.DataFrame(['Wie geht es dir heute', 'mir geht es gut'], columns=['German_Text'])

def translationPipeline(text):
    input_ids = tokenizer.encode(text, return_tensors="pt")
    outputs = model.generate(input_ids)
    decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return decoded

df['English_Text']=df['German_Text'].apply(translationPipeline)
print(df)

输出:

             German_Text             English_Text
0  Wie geht es dir heute  How are you doing today
1        mir geht es gut                 I'm fine
于 2021-02-14T14:15:46.297 回答