1

I downloaded the Wikipedia dumps(the first torrent on this page) and tried to index all the links by storing them in a python dictionary. I stored the links as a list of destinations in a dictionary with a key of the current page. However as I processed the dump I ended up with a MemoryError, so I decided to assign each page an integer ID. This got me farther but I still ended up with a MemoryError. What can I do to process this without that? I would prefer to store it all in memory. As my code is reasonably long I posted it here.

4

1 回答 1

1

您应该开始查看数据库,以便为您的 id 和相关链接编制索引。

首先,您可以尝试使用SqliteMySQL

这里是 python 数据库处理的起点

我个人喜欢与 python 模块 psycopg2 结合使用的Postgresql

于 2013-04-21T15:12:54.297 回答