2

我有一个特定的过滤问题(在此处描述:Pig - 如何操作和比较日期?),所以正如我们告诉我的,我决定编写自己的过滤 UDF。这是代码:

import java.io.IOException;

import org.apache.pig.FilterFunc;
import org.apache.pig.data.Tuple;

import org.joda.time.*;
import org.joda.time.format.*;

public class DateCloseEnough extends FilterFunc {


int nbmois;

/*
 * @param nbMois: if the number of months between two dates is inferior to this variable, then we consider that these two dates are close
 */
public DateCloseEnough(String nbmois_) {
    nbmois = Integer.valueOf(nbmois_);
}

public Boolean exec(Tuple input) throws IOException {

    // We're getting the date
    String date1 = (String)input.get(0);

    // We convert it into date
    final DateTimeFormatter dtf = DateTimeFormat.forPattern("MM yyyy");
    LocalDate d1 = new LocalDate();
    d1 = LocalDate.parse(date1, dtf);
    d1 = d1.withDayOfMonth(1);

    // We're getting today's date
    DateTime today = new DateTime();
    int mois = today.getMonthOfYear();
    String real_mois;
    if(mois >= 1 && mois <= 9) real_mois = "0" + mois;
    else real_mois = "" + mois;

    LocalDate d2 = new LocalDate();
    d2 = LocalDate.parse(real_mois + " " + today.getYear(), dtf);
    d2 = d2.withDayOfMonth(1);

    // Number of months between these two dates
    String nb_months_between = "" + Months.monthsBetween(d1,d2);

    return (Integer.parseInt(nb_months_between) <= nbmois);

}



}

我从 Eclipse 创建了此代码的 Jar 文件。

我正在使用这些 piglatin 代码行过滤我的数据:

REGISTER Desktop/myUDFs.jar
DEFINE DateCloseEnough DateCloseEnough('12');

experiences1 = LOAD '/home/training/Desktop/BDD/experience.txt' USING PigStorage(',') AS (id_cv:int, id_experience:int, date_deb:chararray, date_fin:chararray, duree:int, contenu_experience:chararray);

experiences = FILTER experiences1 BY DateCloseEnough(date_fin);

我正在使用这个 linux 命令启动我的程序:

pig -x local "myScript.pig"

我得到这个错误:

2013-06-19 07:27:17,253 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/training/pig_1371652037252.log
2013-06-19 07:27:17,933 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/joda/time/ReadablePartial Details at logfile: /home/training/pig_1371652037252.log

我检查了日志文件,我看到了这个:

Pig Stack Trace

ERROR 2998: Unhandled internal error. org/joda/time/ReadablePartial

java.lang.NoClassDefFoundError: org/joda/time/ReadablePartial
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:441)
at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:471)
at org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigContext.java:544)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncSpec(QueryParser.java:4834)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.PUnaryCond(QueryParser.java:1949)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.PAndCond(QueryParser.java:1790)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.POrCond(QueryParser.java:1734)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.PCond(QueryParser.java:1700)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.FilterClause(QueryParser.java:1548)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1276)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:893)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:682)
at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1031)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:981)
at org.apache.pig.PigServer.registerQuery(PigServer.java:383)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:717)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:273)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:320)
Caused by: java.lang.ClassNotFoundException: org.joda.time.ReadablePartial
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
... 24 more

我试图修改我的 PIG_CLASSPATH 变量,但我发现这个变量根本不存在(虽然其他一些猪脚本正在工作)。

你有解决问题的想法吗?

谢谢。

4

2 回答 2

1

首先,您需要告诉 Pig 您正在使用哪个 jar。请参阅此答案:如何使用 PIG 包含外部 jar 文件配置构建路径以在 eclipse 中添加它是不够的。Eclipse 不会帮助您生成正确的 jar。

其次,String nb_months_between = "" + Months.monthsBetween(d1,d2);是错误的。您可以使用int nb_months_between = Months.monthsBetween(d1,d2).getMonths();. 如果您阅读 Months.toString,它会返回"P" + String.valueOf(getValue()) + "M";. 所以你不能使用这个值并且想把它转换成一个int。

于 2013-06-20T08:47:12.787 回答
0

你需要这个包:org/joda/time/ReadablePartial

可以在这里找到:jarfinder 下载joda-time-1.5.jar. 添加到您的项目中,这应该可以解决。

于 2013-06-19T15:14:43.193 回答