3

我在干净的 AWS 实例中使用默认配置设置 cassandra,并将 10000 列插入一行,每列有 1MB 数据。我使用这个 ruby​​(版本 1.9.3)脚本:

10000.times do
    key = rand(36**8).to_s(36)
    value = rand(36**1024).to_s(36) * 1024
    Cas_client.insert(TestColumnFamily,TestRow,{key=>value})
end

每次我运行这个脚本时,它都会崩溃:

/usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/socket.rb:109:in `read': CassandraThrift::Cassandra::Client::TransportException        from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/base_transport.rb:87:in `read_all'
    from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/framed_transport.rb:104:in `read_frame'
    from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/framed_transport.rb:69:in `read_into_buffer'
    from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/client.rb:45:in `read_message_begin'
    from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/client.rb:45:in `receive_message'
    from /usr/local/lib/ruby/gems/1.9.1/gems/cassandra-0.15.0/vendor/0.8/gen-rb/cassandra.rb:251:in `recv_batch_mutate'
    from /usr/local/lib/ruby/gems/1.9.1/gems/cassandra-0.15.0/vendor/0.8/gen-rb/cassandra.rb:243:in `batch_mutate'
    from /usr/local/lib/ruby/gems/1.9.1/gems/thrift_client-0.8.1/lib/thrift_client/abstract_thrift_client.rb:150:in `handled_proxy'        from /usr/local/lib/ruby/gems/1.9.1/gems/thrift_client-0.8.1/lib/thrift_client/abstract_thrift_client.rb:60:in `batch_mutate'
    from /usr/local/lib/ruby/gems/1.9.1/gems/cassandra-0.15.0/lib/cassandra/protocol.rb:7:in `_mutate'
    from /usr/local/lib/ruby/gems/1.9.1/gems/cassandra-0.15.0/lib/cassandra/cassandra.rb:463:in `insert'
    from a.rb:6:in `block in <main>'
    from a.rb:3:in `times'
    from a.rb:3:in `<main>'

然而 cassandra 执行正常,然后我运行另一个 ruby​​ 脚本来获取我插入了多少列:

p cas_client.count_columns(TestColumnFamily,TestRow)

这个脚本再次崩溃,同样的错误信息。并且 cassandra 进程保持在 100% 的 cpu 使用率。

AWS m1.xlarge type instance (15GB mem,800GB harddisk, 4cores cpu)
cassandra-1.1.2
ruby-1.9.3-p194
jdk-7u6-linux-x64
ruby-gems:
    cassandra (0.15.0)
    thrift (0.8.0)
    thrift_client (0.8.1)

问题是什么?

4

1 回答 1

2

10,000 columns at 1mb each is 10 gigs of data.

Cassandra rpc uses thrift, which requires that the entire return value from an rpc call must fit in memory, so trying to read all columns would require you to load a 10 gig thrift object into memory which is not practical, especially in ruby.

于 2012-08-21T04:37:14.973 回答