I'm stuck in doing a very simple import operation in MongoDB. I have a file, 200MB in size, JSON format. Its a feeds dump, format as: {"some-headers":"", "dump":[{"item-id":"item-1"},{"item-id":"item-2"},...]}
This json feed contains words in languages other than english too, like Chinese, Japanese, Characters, etc.
I tried to do a mongoimport as mongoimport --db testdb --collection testcollection --file dump.json
but possibly, because the data is a bit complex, its treating dump
as a column, resulting in error, due to 4MB column value limit.
I further tried and a python script:
import simplejson
import pymongo
conn = pymongo.Connection("localhost",27017)
db = conn.testdb
c = db.testcollection
o = open("dump.json")
s = simplejson.load(o)
for x in s['dump']:
c.insert(x)
o.close()
Python is killed while running this thing, possibly due to the very limited resources I'm trying to work with.
I reduced the filesize, by getting a new json dump at 50MB, now due to ASCII issues, python is troubling me again.
I am looking for options both way using mongoimport
and with above python script. Any further solutions shall also be greatly appreciated.
Also, I might some day reach the json dump ~GBs, so if there will be some other solution I should consider then, pl do highlight.