0

我正在尝试训练一个聊天机器人,大部分数据都在文本文件中。

我拉:

Matt said you have a "shit load" of dining dollars. I have almost none so if you're willing to sell, I'm willing to buy.

来自文本文件,但是当 chatterbot 语料库尝试训练机器人时,它会将上述内容读取为:

'Matt said you have a "shit load" of dining dollars\\ I have almost none so if you\'re willing to sell, I\'m willing to buy\\\n'

我怎样才能解决这个问题?

这是我的代码:

def train_from_text():
    #chatbot.set_trainer(ListTrainer)
    directory = basedir + "Text Trainers"
    files = find_files_in_directory(directory)
    for file in files:
        conversation = []
        file_name = directory+"/"+file
        with open(file_name, 'r') as to_read:
            for line in to_read:
                conversation.append(line)
        chatbot.train(conversation)

请原谅发誓,这是我得到的数据。

编辑:完全错误

Traceback (most recent call last):
  File "E:/Jason Chatterbot/Jason Chat.py", line 102, in <module>
control()
  File "E:/Jason Chatterbot/Jason Chat.py", line 96, in control
train_from_text()
  File "E:/Jason Chatterbot/Jason Chat.py", line 58, in train_from_text
chatbot.train(conversation)
  File "C:\Python27\lib\site-packages\chatterbot\trainers.py", line 119, in train
corpora = self.corpus.load_corpus(corpus_path)
  File "C:\Python27\lib\site-packages\chatterbot_corpus\corpus.py", line 98, in load_corpus
corpus_data = self.read_corpus(file_path)
  File "C:\Python27\lib\site-packages\chatterbot_corpus\corpus.py", line 63, in read_corpus
with io.open(file_name, encoding='utf-8') as data_file:
IOError: [Errno 22] Invalid argument: 'Matt said you have a "shit load" of dining dollars\\ I have almost none so if you\'re willing to sell, I\'m willing to buy\\\r\n'
4

1 回答 1

0

在不查看数据的更大子集的情况下,它似乎将单引号 (') 替换为转义单引号 (\')、实际换行符、转义换行符 (\n) 和带有双反斜杠 (\) 的句点

一个简单的字符串替换可能会为您解决这个问题,具体取决于数据被破坏的程度。尝试改变

conversation.append(line)

conversation.append(line.replace("\\'","'").replace('\\\\','.').replace("\\n","\n"))

我们基本上是在尝试扭转那些自动进行的替换。

于 2017-11-28T04:44:44.077 回答