I know, Hadoop is not only alternative for semistructured data processing in general — I can do many things with plain tab-separated data and a bunch of unix tools (cut, grep, sed, ...) and hand-written python scripts. But sometimes I get really big amounts of data and processing time goes up to 20-30 minutes. It's unacceptable to me, because I want experiment with dataset dynamically, running some semi-ad-hoc queries and etc.
So, what amount of data do you consider enough to setting Hadoop cluster in terms of cost-results of this approach?