1

我正在使用 Hadoop-Vertica 连接器将一个大文件导入 Vertica。我试图在没有 Reducer 的情况下使用 hadoop 来做到这一点。但是在映射过程中vertica输出表似乎无法初始化,总是有错误。

当我检查文档时,它没有说我们可以在映射期间写入 Vertica,所以我想知道我们是否可以这样做?

谢谢!

编辑

这是Hadoop Vertica 连接器的文档。

错误:

java.io.IOException: Cannot set record by name if names not initialized
at com.vertica.hadoop.VerticaRecord.set(VerticaRecord.java:270)
at com.vertica.hadoop.VerticaWordCount$TokenizerMapper.map(VerticaWordCount.java:92)
at com.vertica.hadoop.VerticaWordCount$TokenizerMapper.map(VerticaWordCount.java:60)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doA

查看VerticaWordCount.java的源码,发现输出表的名字列表根本没有初始化。

这是我在 run() 中的配置:

  Job job = new Job(conf, "vertica hadoop");
  conf = job.getConfiguration();
  conf.set("mapreduce.job.tracker", "local");

  //job.setInputFormatClass(VerticaInputFormat.class);
  //You have to set the MapOutputKeyClass and MapOutputValueClass, 
  //since by default it will be the same as the class of Reducer's
  //Output Key and Value
  job.setMapOutputKeyClass(Text.class);
  job.setMapOutputValueClass(VerticaRecord.class);

  /*************Settings for Vertica output************************/
  //Set the output format of Reduce class. 
  //I will output VerticaRecords that will be stored in the database
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(VerticaRecord.class);

  //Tell Hadoop to send its output to the Vertica
  job.setOutputFormatClass(VerticaOutputFormat.class);
  /****************************************************************/

  job.setJarByClass(VerticaWordCount.class);
  job.setMapperClass(TokenizerMapper.class);
  FileInputFormat.addInputPath(job, new Path("/user/tmp/input"));

  /******************************************************************/
  //Defining the output table
  //VerticaOutputFormat.setOutput(jobObject, tableName, [truncate, ["columnName1 dataType1" [,"columnNamen dataTypen" ...]] );
  VerticaOutputFormat.setOutput(job, "target", true, "a int", "b varchar", "c varchar");
4

0 回答 0