I'm trying to re-write famous example of Spark's text classification (http://chimpler.wordpress.com/2014/06/11/classifiying-documents-using-naive-bayes-on-apache-spark-mllib/) on Java 8.
I have a problem - in this code I'm making some data preparations for getting idfs of all words in all files:
termDocsRdd.collect().stream().flatMap(doc -> doc.getTerms().stream()
.map(term -> new ImmutableMap.Builder<String, String>()
.put(doc.getName(),term)
.build())).distinct()
And I'm stuck on the groupBy operation. (I need to group this by term, so each term must be a key and the value must be a sequence of documents). In Scala this operation looks very simple - .groupBy(_._2). But how can I do this in Java?
I tried to write something like:
.groupingBy(term -> term, mapping((Document) d -> d.getDocNameContainsTerm(term), toList()));
but it's incorrect...
Somebody knows how to write it in Java?
Thank You very much.