python - 使用 sklearn CountVectorizer 时的换行符

问问题 2017-04-28T14:53:29.377

369 次

我有一个字符串列表，例如：

docs = ['this is a line\nthis is another line', 'this is another doc']

我希望CountVectorizer找到给定范围内的所有 char-n-gram，而不排除\n字符。也就是说，一个令牌可能是：'a line\nthis'. 默认预处理器似乎无法执行此操作，\n并且始终被视为空格。我试图用一个身份函数替换预处理器，但这也不起作用。

0 回答 0