python - 如何在 python 中的 spark notebook 上使用 pandas（dashDB 上的数据）

Question

您好，我正在使用 IBM Bluemix。在这里，我使用 Apache Spark 笔记本并从 dashDB 加载数据，我试图提供可视化效果，它不显示行，只显示列。

def get_file_content(credentials):

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)


props = {}
props['user'] = credentials['username']
props['password'] = credentials['password']

# fill in table name
table = credentials['username'] + "." + "BATTLES"

   data_df=sqlContext.read.jdbc(credentials['jdbcurl'],table,properties=props)
data_df.printSchema()

return StringIO.StringIO(data_df)

当我使用这个命令时：

data_df.take(5)

我得到了前 5 行数据的信息，包括列和行。但是当我这样做时：

content_string = get_file_content(credentials)
BATTLES_df = pd.read_table(content_string)

我收到此错误：

ValueError：没有要从文件中解析的列

然后当我尝试查看.head()或.tail()仅显示列名时。

有人在这里看到可能的问题吗？我对python知之甚少。谢谢，麻烦您了。

score 1 · Accepted Answer

这是对我有用的解决方案。我换了 BATTLES_df = pd.read_table(content_string)

和

BATTLES_df=data_df.toPandas()

谢谢

score 0 · Accepted Answer

export PYSPARK_DRIVER_PYTHON=ipython
export PYSPARK_DRIVER_PYTHON_OPTS=notebook

并转到您的 spark 目录

cd ~/spark-1.6.1-bin-hadoop2.6/

./bin/pyspark --packages com.datastax.spark:spark-cassandra-connector_scalaversion:spark_version-M1

您可以编写以下代码。

import pandas as pd

python - 如何在 python 中的 spark notebook 上使用 pandas（dashDB 上的数据）

2 回答 2

Related

Reference