3

我从 HDFS 系统中读取了 parquet 文件:

path<-"hdfs://part_2015"
AppDF <- parquetFile(sqlContext, path)
printSchema(AppDF)

root
 |-- app: binary (nullable = true)
 |-- category: binary (nullable = true)
 |-- date: binary (nullable = true)
 |-- user: binary (nullable = true)

class(AppDF)

[1] "DataFrame"
attr(,"package")
[1] "SparkR"

collect(AppDF)
.....error:
arguments imply differing number of rows: 46021, 39175, 62744, 27137

head(AppDF)
.....error:
arguments imply differing number of rows: 36, 30, 48

我读过一些关于这个问题的帖子。但这不是我的情况。事实上,我只是从 parquet 文件中读取了一个表,head()或者collect()它。我的拼花桌是这样的:

app   category  date        user
aaa   test      20150101    123
aaa   test      20150102    345
aaa   test      20150103    678
aaaa  testA     20150104    123
aaaa  testA     20150105    234
aaaa  testA     20150106    4345
bbbb  testB     20150101    5435

我正在使用 spark-1.4.0-bin-hadoop2.6 我通过使用在集群上运行它

./sparkR --master yarn--client

我在本地也试过了,同样的问题。

showDF(AppDF)

+-----------+-----------+-----------+-----------+
|        app|   category|       date|       user|
+-----------+-----------+-----------+-----------+
|[B@217fa749|[B@43bfbacd|[B@60810b7a|[B@3818a815|
|[B@5ac31778|[B@3e39f5d5|[B@4f3a92dd| [B@e8013ce|
|[B@7a9440d1|[B@1b2b9836|[B@4b160f29|[B@153d7342|
|[B@7559fcf2|[B@66edb00e|[B@7ec19bec|[B@58e3e3f7|
|[B@598b9ab8|[B@5c5ad3f5|[B@4f11a931|[B@107af885|
|[B@7951ec36|[B@716b0b73|[B@2abce531|[B@576b09e2|
|[B@34560144|[B@7a6d3233|[B@16faf110|[B@34e85d39|
| [B@3406452|[B@787a4528|[B@235282e3|[B@7e0f1732|
|[B@10bc1446|[B@2bd7083f|[B@325e7695|[B@57bb4a08|
|[B@48f98037|[B@7450c04e|[B@61817c8a|[B@7c177a08|
|[B@694ce2dd|[B@36c2512d| [B@f5f7d71|[B@46248d99|
|[B@479dee25|[B@517de3de|[B@1ffb2d9e|[B@236ff079|
|[B@52ac196f|[B@20b9f0d0| [B@f70f879|[B@41c8d7da|
|[B@68d34af3| [B@7ddcd49|[B@72d077a7|[B@545fafd4|
|[B@5610b292|[B@623bbb62|[B@3f8b5150|[B@53877bc7|
|[B@63cf70a8|[B@47ed58c9|[B@2f601903|[B@4e0a2c41|
|[B@7ddf876d|[B@5e3445aa|[B@39c9cc37|[B@6f7e4c84|
|[B@4cd1a74b|[B@583e5453|[B@64124267|[B@6ac5ab84|
|[B@577f9ddf|[B@7b55c859|[B@3cd48a51|[B@25c4eb0a|
|[B@2322f0e5|[B@4af55c68|[B@3285d64a|[B@70b7ae2f|
+-----------+-----------+-----------+-----------+

我还尝试在 Scala 中读取这个 parquet 文件。并执行 collect() 操作。似乎一切正常。所以这应该是 SparkR 特有的问题

4

0 回答 0