python - 使用python从文本文件中查找并打印带引号的文本

Question

我是 python 初学者，希望 python 从文本文件中捕获引号中的所有文本。我尝试了以下方法：

filename = raw_input("Enter the full path of the file to be used: ")
input = open(filename, 'r')
import re
quotes = re.findall(ur'"[\^u201d]*["\u201d]', input)
print quotes

我得到错误：

Traceback (most recent call last):
  File "/Users/nithin/Documents/Python/Capture Quotes", line 5, in <module>
    quotes = re.findall(ur'"[\^u201d]*["\u201d]', input)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 177, in findall
    return _compile(pattern, flags).findall(string)
TypeError: expected string or buffer

谁能帮我吗？

score 2 · Accepted Answer

正如 Bakuriu 指出的那样，您需要.read()像这样添加：

quotes = re.findall(ur'[^\u201d]*[\u201d]', input.read())

open()仅返回一个文件对象，而f.read()将返回一个字符串。此外，我猜您正在寻找两个引号之间的所有内容，而不是在[\^u201d]引号之前出现零次或多次。所以我会试试这个：

quotes = re.findall(ur'[\u201d][^\u201d]*[\u201d]', input.read(), re.U)

unicode的re.U帐户。或者（如果你没有两组右双引号并且不需要unicode）：

quotes = re.findall(r'"[^"]*"', input.read(), re.U)

最后，您可能想要选择一个不同于的变量input，因为input它是 python 中的一个关键字。

您的结果可能如下所示：

>>> input2 = """
cfrhubecf "ehukl wehunkl echnk
wehukb ewni; wejio;"
"werulih"
"""
>>> quotes = re.findall(r'"[^"]*"', input2, re.U)
>>> print quotes
['"ehukl wehunkl echnk\nwehukb ewni; wejio;"', '"werulih"']

score 0 · Accepted Answer

您可以尝试一些 python 内置函数，而不是使用正则表达式。生病让你做艰苦的工作：

message = '''
"some text in quotes", some text not in quotes. Some more text 'In different kinds of quotes'.
'''
list_of_single_quote_items = message.split("'")
list_of_double_quote_items = message.split(""")

具有挑战性的部分将是解释拆分列表的含义并处理所有边缘条件（字符串中只有一个引号、转义序列等）

python - 使用python从文本文件中查找并打印带引号的文本

2 回答 2

Related

Reference