So I have a MapReduce job that takes in multiple news articles and outputs the following key value pairs.
.
.
.
<article_id, social_tag.name, social_tag.isCompany, social_tag.code>
<article_id2, social_tag2.name, social_tag2.isCompany, social_tag.code>
<article_id, topic_code.name, topic_code.isCompany, topic_code.rcsCode>
<article_id3, social_tag3.name, social_tag3.isCompany, social_tag.code>
<article_id2, topic_code2.name, topic_code2.isCompany, topic_code2.rcsCode>
.
.
.
As you can see, there are two main different types of data rows that I am currently outputting and right now, these get mixed up in the flat files outputted by mapreduce. Is there anyway I can simply output social_tags to file1 and topic_codes to file2 OR maybe output social_tags to a specified group of files(social1.txt, social2.txt ..etc) and topic_codes to another group (topic1.txt, topic2.txt...etc)
The reason I'm asking this is so that I can store all these into a Hive table later on easily. I preferably would want to have a separate table for each different data type(topic_code, social_tag,... etc.) If any of you guys know a simple way to achieve this without separating the mapreduce output to different files, that would be really helpful too.
Thanks in advance!