I have a situation where I am trying to using the pre-trained hugging-face models to translate a pandas column of text from Dutch to English. My input is simple:
Dutch_text
Hallo, het gaat goed
Hallo, ik ben niet in orde
Stackoverflow is nuttig
I am using the below code to translate the above column and I want to store my result into a new column ENG_Text. So the output will look like this:
ENG_Text
Hello, I am good
Hi, I'm not okay
Stackoverflow is helpful
The code that I am using is as follows:
#https://huggingface.co/Helsinki-NLP for other pretrained models
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-nl-en")
model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-nl-en")
input_1 = df['Dutch_text']
input_ids = tokenizer("translate English to Dutch: "+input_1, return_tensors="pt").input_ids # Batch size 1
outputs = model.generate(input_ids)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded)
Any help would be appreciated!