1

给定一个内容如下的文件:

{
    "title": "Pilot",
    "image": [
        {
            "resource": "http://images2.nokk.nocookie.net/__cb20110227141960/notr/images/8/8b/pilot.jpg",
            "description": "not yet implemented"
        }
    ],
    "content": "<p>The pilot ...</p>"
},
{
    "title": "Special Christmas (Part 1)",
    "image": [
        {
            "resource": "http://images1.nat.nocookie.net/__cb20090519172121/obli/images/e/ed/SpecialChristmas.jpg",
            "description": "not yet implemented"
        }
    ],
    "content": "<p>Last comment...</p>"
}

我有这个脚本来替换资源的所有值,就像这样,

"resource": "http://images1.nat.nocookie.net/__cb20090519172121/obli/images/e/ed/SpecialChristmas.jpg"

对于另一个这样的:"../img/SpecialChristmas.jpg"

from StringIO import StringIO    
import re
import urllib

infile = open('test2.txt')
outfile = open('test3.txt', 'w')

pattern = r'"resource": ".+/(.+).jpg"'
replacement = '"resource": "../img/\g<1>.jpg"'
prog = re.compile(".+/(.+).jpg")

for line in infile:
    if prog.match(line):
        print (line) #this prints nothing
    text = re.sub(pattern, replacement, line)
    outfile.write(text)
infile.close()
outfile.close

但我也想打印循环中每个资源的值,如下所示:

"http://images1.nat.nocookie.net/__cb20090519172121/obli/images/e/ed/SpecialChristmas.jpg"
"http://images1.nat.nocookie.net/__cb20090519172121/obli/images/e/ed/SpecialChristmas.jpg"

我正在做的事情不起作用,那么在控制台中打印每个资源值的正确方法是什么?

提前致谢!

4

3 回答 3

2
from json import dumps, loads
with open('that_file') as datfile:
  dat = loads('[' + datfile.read() + ']') # Need some outer braces to make it valid JSON
for item in dat:
  for img in item['image']:
    if 'resource' in img:
      # You may want to do a more sophisticated test here
      # but this will do for an example
      img['resource'] = 'http://example.org'
with open('that_file', 'w') as datfile:
  datfile.write(dumps(dat, indent=4).strip('[]')) # Strip outer array braces in keeping with input. (Shrug)
于 2013-10-15T00:06:48.763 回答
1

您可以在组内有组,只需修改您的原始pattern正则表达式。这可能会有点混乱,因此使用命名组更容易,即。(?P<group_name>pattern)

import re
import urllib

infile = open('test2.txt')
outfile = open('test3.txt', 'w')

pattern = r'"resource": "(?P<path>.+/(?P<filename>.+)\.jpg)"'
replacement = '"resource": "../img/\g<filename>.jpg"'
prog = re.compile(pattern)

for line in infile:
    match = prog.match(line)
    if match:
        print (match.group('path'))
    text = prog.sub(replacement, line)
    outfile.write(text)
infile.close()
outfile.close
于 2013-10-15T00:24:09.843 回答
0

我最终这样做了:

from StringIO import StringIO    
import re
import urllib

infile = open('test2.txt')
outfile = open('test4.txt', 'w')

pattern = r'"resource": ".+/(.+).jpg"'
replacement = '"resource": "../img/\g<1>.jpg"'
prog = re.compile(pattern)

for line in infile:
    if prog.search(line):
        url = line.split('"resource":')[1][2:][:-3]
        print(url)
    text = re.sub(pattern, replacement, line)
    outfile.write(text)
infile.close()
outfile.close

它有效,但我认为它看起来根本不像pythonic

于 2013-10-14T23:46:26.733 回答