1

Essentially, in your experience if faced with a 'big(ish) data' problem how far can you get using Python's multiprocessing module when compared to something like Hadoop?

I ask because aside from proficiency in Python/SQL I am 'programming novice'. Hence Hadoop is pretty impenetrable to me at the moment, so hope to be able to get quite far using the multiprocessing module for many large data problems that come my way, at least until I can get my head around Hadoop.

EDIT for clarity: If I have a 4 core computer are there any advantages to using Hadoop on 4 nodes as opposed to the multiprocessing module?

Thanks.

4

0 回答 0