我正在使用 pig 访问带有计数器列的 cassandra 中的列族。当我尝试转储数据时,出现以下错误:
cqlsh:pollkan> CREATE TABLE votes_count_period_1 (
... period int,
... poll text,
... votes counter,
... PRIMARY KEY (period, poll)
... );
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> select * from votes_count_period_1;
period | poll | votes
----------+--------------------------------------+-------
20130830 | 605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a | 5
20130831 | 405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a | 2
20130831 | 505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a | 3
root@batch:/usr/share/cassandra# pig -x local
2013-08-31 23:02:06,135 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.1 (r1459164) compiled Mar 21 2013, 06:14:38
2013-08-31 23:02:06,136 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/share/cassandra/pig_1377982926133.log
2013-08-31 23:02:06,154 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2013-08-31 23:02:06,252 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
grunt> register /usr/share/cassandra/apache-cassandra-1.2.9.jar
grunt> register /usr/share/cassandra/apache-cassandra-thrift-1.2.9.jar
grunt> register /usr/share/cassandra/lib/libthrift-0.7.0.jar
grunt> A = LOAD 'cql://pollkan/votes_count_period_1' USING org.apache.cassandra.hadoop.pig.CqlStorage();
grunt> DUMP A;
Causes:
2013-08-31 23:01:35,397 [pool-4-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed ColumnFamilySplit((-69569900416187863, '-54603788994328078] @[cassandra001, cassandra002, cassandra003])
2013-08-31 23:01:35,417 [pool-4-thread-1] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2013-08-31 23:01:35,418 [pool-4-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: A[2,4] C: R:
2013-08-31 23:01:35,424 [Thread-10] INFO org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete.
2013-08-31 23:01:35,428 [Thread-10] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local712790083_0002
java.lang.Exception: java.lang.IndexOutOfBoundsException
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:538)
at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:410)
at org.apache.cassandra.db.context.CounterContext.total(CounterContext.java:477)
at org.apache.cassandra.db.marshal.AbstractCommutativeType.compose(AbstractCommutativeType.java:34)
at org.apache.cassandra.db.marshal.AbstractCommutativeType.compose(AbstractCommutativeType.java:25)
at org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.columnToTuple(AbstractCassandraStorage.java:137)
at org.apache.cassandra.hadoop.pig.CqlStorage.getNext(CqlStorage.java:110)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:531)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
我读到https://issues.apache.org/jira/browse/CASSANDRA-5234已解决 cql3 表和计数器列的问题,但我仍然有问题。
顺便说一句,我尝试使用旧式 COMPACT STORAGE 重新创建表,并且我已经进步了一点,但遇到了一个新问题,出现以下错误:
cqlsh:pollkan> CREATE TABLE votes_count_period_2 (
... period int,
... poll text,
... votes counter,
... PRIMARY KEY (period, poll)
... ) WITH COMPACT STORAGE;
cqlsh:pollkan>
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan>
cqlsh:pollkan> select * from votes_count_period_2;
period | poll | votes
----------+--------------------------------------+-------
20130830 | 605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a | 5
20130831 | 405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a | 2
20130831 | 505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a | 3
root@batch:/usr/share/cassandra# pig -x local
2013-08-31 23:02:06,135 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.1 (r1459164) compiled Mar 21 2013, 06:14:38
2013-08-31 23:02:06,136 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/share/cassandra/pig_1377982926133.log
2013-08-31 23:02:06,154 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2013-08-31 23:02:06,252 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
grunt> register /usr/share/cassandra/apache-cassandra-1.2.9.jar
grunt> register /usr/share/cassandra/apache-cassandra-thrift-1.2.9.jar
grunt> register /usr/share/cassandra/lib/libthrift-0.7.0.jar
grunt> A = LOAD 'cql://pollkan/votes_count_period_2' USING org.apache.cassandra.hadoop.pig.CqlStorage();
grunt> DUMP A;
2013-08-31 23:05:59,454 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2013-08-31 23:05:59,458 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2013-08-31 23:05:59,465 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-08-31 23:05:59,466 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
((period,20130830),(poll,605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a),(votes,5))
((period,20130831),(poll,405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a),(votes,2))
((period,20130831),(poll,505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a),(votes,3))
grunt> A = LOAD 'cql://pollkan/votes_count_period_2' USING org.apache.cassandra.hadoop.pig.CqlStorage();
grunt> B = FOREACH A GENERATE poll, votes;
grunt> describe B;
B: {poll: chararray,votes: long}
grunt> C = GROUP B BY poll;
grunt> describe C;
C: {group: chararray,B: {(poll: chararray,votes: long)}}
grunt> D = FOREACH C GENERATE group AS pollgroup, SUM(B.votes);
grunt> describe D;
D: {pollgroup: chararray,long}
grunt> dump D;
2013-08-31 23:53:32,577 [pool-33-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: A[13,4],B[14,4],D[18,4],C[17,4] C: D[18,4],C[17,4] R: D[18,4]
2013-08-31 23:53:32,586 [pool-33-thread-1] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output
2013-08-31 23:53:32,589 [Thread-65] INFO org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete.
2013-08-31 23:53:32,591 [Thread-65] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local814297309_0018
java.lang.Exception: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to java.lang.String
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to java.lang.String
at org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:76)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:112)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
我的版本是 Pig 0.11.1 和 Cassandra 1.2.9。
有什么帮助吗?
谢谢