regex - 为什么 Python 在使用 re 时会给出错误：“TypeError：预期的字符串或缓冲区”？

Question

我不知道我犯了什么小错误，但我觉得这里有一个我没有得到的简单解决方案。从日志文件中，我试图只读取以“开始”结尾的行。每行都有很多信息，所以我将 re 简化为：“(.*)start$”，我认为这是正确的。
一个字符串的例子是：

05/06/2013 12:06:58 AM | null | com.skldfjs : start

import pandas as pd
s=pd.read_csv('Log_file.csv')
s
import re
items=re.findall("(.*)start$",s,re.MULTILINE)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Anaconda\lib\re.py", line 177, in findall
    return _compile(pattern, flags).findall(string)
<b>TypeError: expected string or buffer</b>

有谁知道如何解决这个问题或为什么会这样？谢谢！凯尔西

score 0 · Accepted Answer

发生此错误是因为 pandas.read_csv() 返回的是非字符串对象，而不是类似 excel 的文件。

由于我从未使用过熊猫，因此无法提供更多信息。但是，如果没有绝对需要 pandas，您可以尝试将文件作为纯文本文件读取，然后使用 re.findall() 对其进行解析。

with open("file.csv") as f:
    content = f.read()

regex = r"(.*)start$"
items = re.findall(regex, content, re.MULTILINE)

regex - 为什么 Python 在使用 re 时会给出错误：“TypeError：预期的字符串或缓冲区”？

1 回答 1

Related

Reference