python - Indexing Wikipedia's Links to Make a Graph

Question

I downloaded the Wikipedia dumps(the first torrent on this page) and tried to index all the links by storing them in a python dictionary. I stored the links as a list of destinations in a dictionary with a key of the current page. However as I processed the dump I ended up with a MemoryError, so I decided to assign each page an integer ID. This got me farther but I still ended up with a MemoryError. What can I do to process this without that? I would prefer to store it all in memory. As my code is reasonably long I posted it here.

score 1 · Accepted Answer

您应该开始查看数据库，以便为您的 id 和相关链接编制索引。

首先，您可以尝试使用Sqlite或MySQL。

这里是 python 数据库处理的起点。

我个人喜欢与 python 模块 psycopg2 结合使用的Postgresql

python - Indexing Wikipedia's Links to Make a Graph

1 回答 1

Related

Reference