python - 如何让我的 RegEx 捕获冒号两侧的文本？

Question

我正在尝试从文件中解析一些输入文本，这些文本最初是从 Twitter API 中获取的。该文件是纯文本，在这种情况下我实际上并没有抓取 JSON。这是输入文本的片段：

.....HootSuite</a>", "text": "For independent news reports on the crisis in #Japan, 
see @DemocracyNow News Archive: http://ow.ly/4ht9Q
#nuclear #Fukushima #rdran #japon", "created_at": "Sat Mar 19.....

基本上我需要抓住这个：

"text": "For independent news reports "on" the crisis in #Japan, see @DemocracyNow 
News Archive: http://ow.ly/4ht9Q #nuclear #Fukushima #rdran #japon"

这是我试图开始工作的两个，但我遇到了一些麻烦：

    re.findall('"text":[^_]*',line)
    re.findall('"text":[^:}]+',line)

第一个将允许我在我想要的部分之后抓取所有内容，直到“创建”。第二个也可以，但是当文本包含“：”时，它直到信息结束才会出现

有人对 RegEx 有一些经验，可以为我指明正确的方向吗？

score 1 · Accepted Answer

如果您使用的是 Twitter API，我想它会将 JSON 返回给您。JSON 支持任意嵌套，正则表达式永远无法在每种情况下都正确解析它。使用 JSON 解析器会更好。由于 YAML 是 JSON 的超集，因此您也可以使用 YAML 解析器。我会看看PyYaml。（这是我所知道的。他们可能也只是 JSON 解析器）

然后解析就像：

import yaml
results = yaml.load(twitter_response)
print results["text"]  # This would contain the string you're interested in.

score 0 · Accepted Answer

使用simplejson解析 JSON。

遵循本教程：http: //blogs.openshine.com/pvieytes/2011/05/18/parsing-twitter-user-timeline-with-python/

score 0 · Accepted Answer

Json 是一种足够简单的格式，如果您尝试做一些琐碎的事情，您并不总是需要解析器。考虑示例行：

>>> line = """{ "text" : "blah blah foo", "other" : "blah blah bar" }"""

这里有两种方法可以做你想做的事。

使用正则表达式：

>>> import re
>>> m = re.search('"text"\ *:\ *"([^"]*)',line)
>>> m.group()
'"text" : "blah blah bar'
>>> m.group(1)
'blah blah bar'

使用 eval （json 是一种非常 Pythonic 的格式）：

>>> d = eval(line)
>>> d['text']
'blah blah bar'

python - 如何让我的 RegEx 捕获冒号两侧的文本？

3 回答 3

Related

Reference