0

我正在使用 pig 访问带有计数器列的 cassandra 中的列族。当我尝试转储数据时,出现以下错误:

cqlsh:pollkan> CREATE TABLE votes_count_period_1 (
           ...   period int,
           ...   poll text,
           ...   votes counter,
           ...   PRIMARY KEY (period, poll)
           ... );

cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';

cqlsh:pollkan> select * from votes_count_period_1;

 period   | poll                                 | votes
----------+--------------------------------------+-------
 20130830 | 605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a |     5
 20130831 | 405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a |     2
 20130831 | 505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a |     3


root@batch:/usr/share/cassandra# pig -x local
2013-08-31 23:02:06,135 [main] INFO  org.apache.pig.Main - Apache Pig version 0.11.1 (r1459164) compiled Mar 21 2013, 06:14:38
2013-08-31 23:02:06,136 [main] INFO  org.apache.pig.Main - Logging error messages to: /usr/share/cassandra/pig_1377982926133.log
2013-08-31 23:02:06,154 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2013-08-31 23:02:06,252 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
grunt> register /usr/share/cassandra/apache-cassandra-1.2.9.jar
grunt> register /usr/share/cassandra/apache-cassandra-thrift-1.2.9.jar
grunt> register /usr/share/cassandra/lib/libthrift-0.7.0.jar
grunt> A = LOAD 'cql://pollkan/votes_count_period_1' USING org.apache.cassandra.hadoop.pig.CqlStorage();
grunt> DUMP A;

Causes:

2013-08-31 23:01:35,397 [pool-4-thread-1] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed ColumnFamilySplit((-69569900416187863, '-54603788994328078] @[cassandra001, cassandra002, cassandra003])
2013-08-31 23:01:35,417 [pool-4-thread-1] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2013-08-31 23:01:35,418 [pool-4-thread-1] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: A[2,4] C:  R:
2013-08-31 23:01:35,424 [Thread-10] INFO  org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete.
2013-08-31 23:01:35,428 [Thread-10] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local712790083_0002
java.lang.Exception: java.lang.IndexOutOfBoundsException
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.IndexOutOfBoundsException
        at java.nio.Buffer.checkIndex(Buffer.java:538)
        at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:410)
        at org.apache.cassandra.db.context.CounterContext.total(CounterContext.java:477)
        at org.apache.cassandra.db.marshal.AbstractCommutativeType.compose(AbstractCommutativeType.java:34)
        at org.apache.cassandra.db.marshal.AbstractCommutativeType.compose(AbstractCommutativeType.java:25)
        at org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.columnToTuple(AbstractCassandraStorage.java:137)
        at org.apache.cassandra.hadoop.pig.CqlStorage.getNext(CqlStorage.java:110)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:531)
        at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)

我读到https://issues.apache.org/jira/browse/CASSANDRA-5234已解决 cql3 表和计数器列的问题,但我仍然有问题。

顺便说一句,我尝试使用旧式 COMPACT STORAGE 重新创建表,并且我已经进步了一点,但遇到了一个新问题,出现以下错误:

cqlsh:pollkan> CREATE TABLE votes_count_period_2 (
           ...   period int,
           ...   poll text,
           ...   votes counter,
           ...   PRIMARY KEY (period, poll)
           ... ) WITH COMPACT STORAGE;
cqlsh:pollkan>
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan>
cqlsh:pollkan> select * from votes_count_period_2;

 period   | poll                                 | votes
----------+--------------------------------------+-------
 20130830 | 605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a |     5
 20130831 | 405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a |     2
 20130831 | 505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a |     3

root@batch:/usr/share/cassandra# pig -x local
2013-08-31 23:02:06,135 [main] INFO  org.apache.pig.Main - Apache Pig version 0.11.1 (r1459164) compiled Mar 21 2013, 06:14:38
2013-08-31 23:02:06,136 [main] INFO  org.apache.pig.Main - Logging error messages to: /usr/share/cassandra/pig_1377982926133.log
2013-08-31 23:02:06,154 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2013-08-31 23:02:06,252 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
grunt> register /usr/share/cassandra/apache-cassandra-1.2.9.jar
grunt> register /usr/share/cassandra/apache-cassandra-thrift-1.2.9.jar
grunt> register /usr/share/cassandra/lib/libthrift-0.7.0.jar
grunt> A = LOAD 'cql://pollkan/votes_count_period_2' USING org.apache.cassandra.hadoop.pig.CqlStorage();
grunt> DUMP A;
2013-08-31 23:05:59,454 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2013-08-31 23:05:59,458 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2013-08-31 23:05:59,465 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-08-31 23:05:59,466 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
((period,20130830),(poll,605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a),(votes,5))
((period,20130831),(poll,405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a),(votes,2))
((period,20130831),(poll,505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a),(votes,3))

grunt> A = LOAD 'cql://pollkan/votes_count_period_2' USING org.apache.cassandra.hadoop.pig.CqlStorage();
grunt> B = FOREACH A GENERATE poll, votes;
grunt> describe B;
B: {poll: chararray,votes: long}
grunt> C = GROUP B BY poll;
grunt> describe C;
C: {group: chararray,B: {(poll: chararray,votes: long)}}
grunt> D = FOREACH C GENERATE group AS pollgroup, SUM(B.votes);
grunt> describe D;
D: {pollgroup: chararray,long}
grunt> dump D;

2013-08-31 23:53:32,577 [pool-33-thread-1] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: A[13,4],B[14,4],D[18,4],C[17,4] C: D[18,4],C[17,4] R: D[18,4]
2013-08-31 23:53:32,586 [pool-33-thread-1] INFO  org.apache.hadoop.mapred.MapTask - Starting flush of map output
2013-08-31 23:53:32,589 [Thread-65] INFO  org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete.
2013-08-31 23:53:32,591 [Thread-65] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local814297309_0018
java.lang.Exception: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to java.lang.String
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to java.lang.String
        at org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:76)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:112)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)

我的版本是 Pig 0.11.1 和 Cassandra 1.2.9。

有什么帮助吗?

谢谢

4

1 回答 1

1

我今天早些时候在测试最新的 Pig cql3 与类似数据结构的集成时发现了同样的问题。

您提到的 JIRA 问题https://issues.apache.org/jira/browse/CASSANDRA-5234确实包含一个已验证可用于其中一位评论者的补丁。但是,快速浏览一下 cassandra git 会发现它尚未应用于 1.2 分支或主干上。我已经在 J​​IRA 问题上添加了一条评论。

在提交补丁并发布新的稳定版本之前,解决方案是在新签出 1.2.9 时应用补丁,重新编译并部署到您的 hadoop 节点(如果您愿意的话)。

于 2013-09-05T18:44:12.103 回答