3

我在循环通过 pickle 读取的列表时遇到问题。这段代码的最终目的是遍历每个项目并返回每个项目的 id 号。

## Opening the file, and loading it into a list##
with open('TEMP_ITEMS.txt', 'rb') as openfile:
    items = pickle.load(openfile)

我尝试循环遍历并查找 id 编号的尝试是基于一些旧的 xml 抓取技术,但由于某种原因,该逻辑不适用于此处。

for item in enumerate(items):

    pattern0 = re.compile('ID: (.*?) <br>')
    idnumber = float(re.findall(pattern0, items[0])[0])
    print "ID Number: ",idnumber 

TEMP_ITEMS.txt 内容示例

(lp0
S'\n                <item>\n                    <title>Timmy</title>\n                    <link>caturl</link>\n                    <description><![CDATA[\n                                Timmy <br>\n                                ID: 3712 <br>\n                                Age: 10 <br>\n                                Weight: 7lbs <br>\n                                Time: 17:23 <br>\n                                Cat Name: Timmy <br>\n\n                    ]]></description>\n                    <guid isPermaLink="false">04e72b29-065d-4893-a4d2-f16ff30a283e</guid>\n                    <pubDate>Fri, 21 Jun 2013 01:09:05 GMT</pubDate>\n                </item>'
p1
aS'\n                <item>\n                    <title>George</title>\n                    <link>caturl</link>\n                    <description><![CDATA[\n                                George <br>\n                                ID: 4124 <br>\n                                Age: 14 <br>\n                                Weight: 8lbs <br>\n                                Time: 15:41 <br>\n                                Cat Name: George <br>\n\n                    ]]></description>\n                    <guid isPermaLink="false">212f9fbf-564b-470a-a64a-ef51036ff06a</guid>\n                    <pubDate>Fri, 21 Jun 2013 01:28:20 GMT</pubDate>\n                </item>'
p2
a.

任何有关此问题的帮助或建议将不胜感激。亲切的问候 AEA

根据 falsetru 的建议使用的代码,返回错误

import pickle
import re

with open('TEMP_RSS_ITEMS.txt', 'rb') as temp_rss_items_open4:
    items = pickle.load(temp_rss_items_open4)        
    print items
    for item in enumerate(items):
        pattern0 = re.compile('ID: (.*) <br>')
        for idnumber in re.findall(pattern0, item):
            print idnumber

它产生的错误代码:

Traceback (most recent call last):
  File "C:/Sharing/test1.py", line 9, in <module>
    for idnumber in re.findall(pattern0, item):
  File "C:\Python27\lib\re.py", line 177, in findall
    return _compile(pattern, flags).findall(string)
TypeError: expected string or buffer
>>> 
4

2 回答 2

6

尝试使用非贪婪版本.*

pattern0 = re.complie(r'ID: (.*?) <br>')

如果 ID 只有数字,则为“+”:

pattern0 = re.complie(r'ID: (\d+)')

更新

import pickle
import re

pattern0 = re.compile('ID: (.*) <br>')
with open('TEMP_RSS_ITEMS.txt', 'rb') as f:
    items = pickle.load(f)        
    for item in items:
        for idnumber in pattern0.findall(item):
            print idnumber
于 2013-06-21T02:00:30.540 回答
4

尝试将项目 [0] 替换为项目:

for item in enumerate(items):
    pattern0 = re.compile('ID: (.*?) <br>')
    idnumber = float(re.findall(pattern0, item)[0])

如果您要遍历每个项目,那么为什么不使用每个项目呢?

于 2013-06-21T02:41:54.583 回答