1

我正在尝试将我的数据框加载到 AstraDB 中,但它需要永远加载.. 我想知道是否有更快的方法通过 python 来完成它?

import cassandra 
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
import pandas as pd

cloud_config= {
        'secure_connect_bundle': 'secure-connect-capstone-project.zip'
}
auth_provider = PlainTextAuthProvider(user,pass)
cluster = Cluster(cloud=cloud_config, auth_provider=auth_provider)
#connect to keyspace_name
session = cluster.connect('iac689')

query = """insert into data_2 (truck_id, active, reading_id, start_mileage, start_time, truck_name, type)
values (%s,%s,%s,%s,%s,%s,%s)"""
for i in df.values:
    session.execute(query, [i[0],i[1],i[2],i[3],i[4],i[5],i[6]])
4

1 回答 1

1

If you really need to do this via Python, then you can speedup code by:

  • Using prepared queries - call session.prepare on your query string, and use it in session.execute.
  • Use asynchronous API (execute_async) instead of synchronous (execute). But you need to track how many in-flight queries you have, etc. to avoid getting errors.

Really, I would recommend to not re-invent the wheel, but dump data as CSV or JSON file, and use DSBulk to load data into Cassandra/Astra - this tool is heavily optimized for loading/unloading data from Cassandra/Astra.

于 2022-01-26T07:24:10.047 回答