1

将 pig & hive 命令解析为 Map Reduce 作业的类是哪个类,这个解析背后的算法是什么?

4

1 回答 1

4

Pig 和 Hive 都使用ANTLR构建编译器来解析它们的脚本。如果您对编译器理论不熟悉,建议您阅读一些相关资料。

对于 Pig,ANLTR 的源代码是src/org/apache/pig/parser/QueryLexer.gsrc/org/apache/pig/parser/QueryParser.g。它们将被编译为org.apache.pig.parser.QueryLexerorg.apache.pig.parser.QueryParser。但是,这两个类用于将 Pig 脚本编译为抽象语法树。然后它将转换为org.apache.pig.newplan.logical.relational.LogicalPlan. 之后,LogcialPlan 将转换为org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan. 这里我列出了一些相关的源文件:

org.apache.pig.newplan.logical.relational.LogicalPlan
org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.MROperPlan
org.apache.pig.parser.QueryParserDriver.parse(String)
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(LogicalPlan, Properties)
org.apache.pig.PigServer.launchPlan(PhysicalPlan, String)
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(PhysicalPlan, PigContext)
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(MROperPlan, MapReduceOper, Configuration, PigContext)
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(MROperPlan, String)
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(PhysicalPlan, String, PigContext)
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(List<Result>, List<Result>, Tuple)
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce.Map.collect(Context, Tuple)

对于 Hive,ANLTR 的源代码是ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g. 它将被编译为org.apache.hadoop.hive.ql.parse.HiveLexerorg.apache.hadoop.hive.ql.parse.HiveParser。这两个类用于将 Hive 脚本编译为抽象语法树。然后它将转换为org.apache.hadoop.hive.ql.QueryPlan. Hive 中的 mapper 和 reducer 是 ExecMapper 和 ExecReducer。

这里我列出了一些相关的源文件:

org.apache.hadoop.hive.cli.CliDriver
org.apache.hadoop.hive.ql.Driver
org.apache.hadoop.hive.ql.Driver.run(String)
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(String, Context)
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(String, Context)
org.apache.hadoop.hive.ql.parse.ASTNode
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer
org.apache.hadoop.hive.ql.QueryPlan
org.apache.hadoop.hive.ql.Driver.compile(String, boolean)
org.apache.hadoop.hive.ql.exec.TaskRunner
org.apache.hadoop.hive.ql.Driver.execute()
org.apache.hadoop.hive.ql.exec.ExecDriver
org.apache.hadoop.hive.ql.exec.ExecMapper
org.apache.hadoop.hive.ql.exec.ExecReducer
org.apache.hadoop.hive.ql.exec.MapOperator

最后,我建议你下载它们的源代码并在eclipse中浏览它们,以找出你想知道的任何问题。

于 2013-06-06T13:08:17.210 回答