hostA 已安装并运行 MySQL(3306 端口)、hive(10000 端口)和 hive Metastore(9083 端口)。hostB 已安装并运行 presto。
目标是让 hostB 运行 presto,它允许对 hostA 上的 hivemetastore 进行查询。
下面出现错误。/home/ec2-user/warehouse/contact 在 hostA 的本地文件系统(不是 hdfs/s3)上确实存在(并且表已分区)但在 hostB 上不存在,为什么 presto 试图在 presto 的本地主机上查找 hive 分区运行 (hostB) 而不是在 hostA 上运行(配置单元元存储在哪里)?Metastore 连接建立,因为 presto 能够列出 Metastore 上的表。
presto-cli --debug --catalog hive --schema default
presto:default> show tables;
Table
----------------------------
account
contact
(2 rows)
Query 20171102_122934_00012_x6ppj, FINISHED, 2 nodes
http://localhost:8080/query.html?20171102_122934_00012_x6ppj
Splits: 18 total, 18 done (100.00%)
CPU Time: 0.0s total, 615 rows/s, 18.8KB/s, 5% active
Per Node: 0.0 parallelism, 8 rows/s, 280B/s
Parallelism: 0.0
0:00 [8 rows, 250B] [17 rows/s, 560B/s]
presto:default> select * from contact;
Query 20171102_122943_00013_x6ppj failed: Partition location does not exist: file:/home/ec2-user/warehouse/contact
com.facebook.presto.spi.PrestoException: Partition location does not exist: file:/home/ec2-user/warehouse/contact
at com.facebook.presto.hive.util.HiveFileIterator.computeNext(HiveFileIterator.java:102)
at com.facebook.presto.hive.util.HiveFileIterator.computeNext(HiveFileIterator.java:41)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:243)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:92)
at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:195)
at com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:45)
at com.facebook.presto.hive.util.ResumableTasks.lambda$submit$1(ResumableTasks.java:33)
at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
cat config.properties
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
# discovery.uri=http://example.net:8080
discovery.uri=http://hostB:8080
cat hive.properties
connector.name=hive-hadoop2
hive.metastore.uri=thrift://hostA:9083
2017-11-02T06:52:30.585Z INFO main com.facebook.presto.metadata.StaticCatalogStore -- Loading catalog etc/catalog/hive.properties --
2017-11-02T06:52:31.307Z INFO main Bootstrap PROPERTY DEFAULT RUNTIME DESCRIPTION
2017-11-02T06:52:31.307Z INFO main Bootstrap hive.allow-corrupt-writes-for-testing false false Allow Hive connector to write data even when data will likely be corrupt
2017-11-02T06:52:31.307Z INFO main Bootstrap hive.assume-canonical-partition-keys false false
2017-11-02T06:52:31.307Z INFO main Bootstrap hive.bucket-execution true true Enable bucket-aware execution: only use a single worker per bucket
2017-11-02T06:52:31.307Z INFO main Bootstrap hive.bucket-writing true true Enable writing to bucketed tables
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.dfs.connect.max-retries 5 5
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.dfs.connect.timeout 500.00ms 500.00ms
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.dfs-timeout 60.00s 60.00s
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.domain-compaction-threshold 100 100 Maximum ranges to allow in a tuple domain without compacting it
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.dfs.domain-socket-path null null
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.fs.cache.max-size 1000 1000 Hadoop FileSystem cache size
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.force-local-scheduling false false
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.hdfs.authentication.type NONE NONE HDFS authentication type
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.hdfs.impersonation.enabled false false Should Presto user be impersonated when communicating with HDFS
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.compression-codec GZIP GZIP
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.metastore.authentication.type NONE NONE Hive Metastore authentication type
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.storage-format RCBINARY RCBINARY
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.immutable-partitions false false Can new data be inserted into existing partitions or existing unpartitioned tables
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.dfs.ipc-ping-interval 10.00s 10.00s
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.max-concurrent-file-renames 20 20
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.max-initial-split-size 32MB 32MB
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.max-initial-splits 200 200
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.metastore-refresh-max-threads 100 100
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.max-outstanding-splits 1000 1000
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.metastore.partition-batch-size.max 100 100
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.max-partitions-per-scan 100000 100000 Maximum allowed partitions for a single table scan
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.max-partitions-per-writers 100 100 Maximum number of partitions per writer
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.max-split-iterator-threads 1000 1000
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.max-split-size 64MB 64MB
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.metastore-cache-maximum-size 10000 10000
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.metastore-cache-ttl 0.00s 0.00s
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.metastore-refresh-interval 0.00s 0.00s
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.metastore.thrift.client.socks-proxy null null
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.metastore-timeout 10.00s 10.00s
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.metastore.partition-batch-size.min 10 10
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.orc.bloom-filters.enabled false false
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.orc.default-bloom-filter-fpp 0.05 0.05 ORC Bloom filter false positive probability
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.orc.max-buffer-size 8MB 8MB
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.orc.max-merge-distance 1MB 1MB
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.orc.max-read-block-size 16MB 16MB
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.orc.optimized-writer.enabled false false
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.orc.stream-buffer-size 8MB 8MB
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.parquet-optimized-reader.enabled false false
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.parquet-predicate-pushdown.enabled false false
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.per-transaction-metastore-cache-maximum-size 1000 1000
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.rcfile-optimized-writer.enabled true true
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.rcfile.writer.validate false false Validate RCFile after write by re-reading the whole file
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.recursive-directories false false
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.config.resources null null
2017-11-02T06:52:31.309Z INFO main Bootstrap hive.respect-table-format true true Should new partitions be written using the existing table format or the default Presto format
2017-11-02T06:52:31.310Z INFO main Bootstrap hive.skip-deletion-for-alter false false Skip deletion of old partition data when a partition is deleted and then inserted in the same transaction
2017-11-02T06:52:31.310Z INFO main Bootstrap hive.table-statistics-enabled true true Enable use of table statistics
2017-11-02T06:52:31.310Z INFO main Bootstrap hive.time-zone Zulu Zulu
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.orc.use-column-names false false Access ORC columns using names from the file
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.parquet.use-column-names false false Access Parquet columns using names from the file
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.dfs.verify-checksum true true
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.write-validation-threads 16 16 Number of threads used for verifying data after a write
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.non-managed-table-writes-enabled false false Enable writes to non-managed (external) tables
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.pin-client-to-current-region false false Should the S3 client be pinned to the current EC2 region
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.aws-access-key null null
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.aws-secret-key [REDACTED] [REDACTED]
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.connect-timeout 5.00s 5.00s
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.encryption-materials-provider null null Use a custom encryption materials provider for S3 data encryption
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.endpoint null null
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.kms-key-id null null Use an AWS KMS key for S3 data encryption
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.max-backoff-time 10.00m 10.00m
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.max-client-retries 5 5
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.max-connections 500 500
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.max-error-retries 10 10
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.max-retry-time 10.00m 10.00m
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.multipart.min-file-size 16MB 16MB Minimum file size for an S3 multipart upload
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.multipart.min-part-size 5MB 5MB Minimum part size for an S3 multipart upload
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.signer-type null null
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.socket-timeout 5.00s 5.00s
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.sse.enabled false false Enable S3 server side encryption
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.sse.kms-key-id null null KMS Key ID to use for S3 server-side encryption with KMS-managed key
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.sse.type S3 S3 Key management type for S3 server-side encryption (S3 or KMS)
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.ssl.enabled true true
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.staging-directory /tmp /tmp Temporary directory for staging files before uploading to S3
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.use-instance-credentials true true
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.s3.user-agent-prefix The user agent prefix to use for S3 calls
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.metastore.uri null [thrift://hostA:9083] Hive metastore URIs (comma separated)
2017-11-02T06:52:31.311Z INFO main Bootstrap hive.metastore thrift thrift
2017-11-02T06:52:31.312Z INFO main Bootstrap hive.allow-add-column false false Allow Hive connector to add column
2017-11-02T06:52:31.312Z INFO main Bootstrap hive.allow-drop-column false false Allow Hive connector to drop column
2017-11-02T06:52:31.312Z INFO main Bootstrap hive.allow-drop-table false false Allow Hive connector to drop table
2017-11-02T06:52:31.312Z INFO main Bootstrap hive.allow-rename-column false false Allow Hive connector to rename column
2017-11-02T06:52:31.312Z INFO main Bootstrap hive.allow-rename-table false false Allow Hive connector to rename table
2017-11-02T06:52:31.312Z INFO main Bootstrap hive.security legacy legacy
2017-11-02T06:52:31.312Z INFO main Bootstrap
2017-11-02T06:52:32.663Z INFO main com.facebook.presto.metadata.StaticCatalogStore -- Added catalog hive using connector hive-hadoop2 --