Hive connector query fails

Dear Community,

Trino version: 364
We are facing error while running query. Error message given below.

Query 20220111_104442_00007_z8uk4, FAILED, 7 nodes
Splits: 2,864 total, 13 done (0.45%)
8:16 [2.39M rows, 966MB] [4.82K rows/s, 1.95MB/s]

Query 20220111_104442_00007_z8uk4 failed: Error reading tail from hdfs://hacluster/user/hive/warehouse/psn.db/detail_uf/dt=1640628000/no=0/154d6b916cc5b128-8ea2b6547d0d1bbb_2066729209_data.0.parq with length 16384
io.trino.spi.TrinoException: Error reading tail from hdfs://hacluster/user/hive/warehouse/psn.db/detail_uf/dt=1640628000/no=0/154d6b916cc5b128-8ea2b6547d0d1bbb_2066729209_data.0.parq with length 16384
	at io.trino.plugin.hive.parquet.HdfsParquetDataSource.readTail(HdfsParquetDataSource.java:113)
	at io.trino.parquet.reader.MetadataReader.readFooter(MetadataReader.java:94)
	at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:213)
	at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:164)
	at io.trino.plugin.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:286)
	at io.trino.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:175)
	at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:49)
	at io.trino.split.PageSourceManager.createPageSource(PageSourceManager.java:68)
	at io.trino.operator.ScanFilterAndProjectOperator$SplitToPages.process(ScanFilterAndProjectOperator.java:268)
	at io.trino.operator.ScanFilterAndProjectOperator$SplitToPages.process(ScanFilterAndProjectOperator.java:196)
	at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:319)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
	at io.trino.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
	at io.trino.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:215)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.trino.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:151)
	at io.trino.operator.Driver.processInternal(Driver.java:388)
	at io.trino.operator.Driver.lambda$processFor$9(Driver.java:292)
	at io.trino.operator.Driver.tryWithLock(Driver.java:685)
	at io.trino.operator.Driver.processFor(Driver.java:285)
	at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1078)
	at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
	at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
	at io.trino.$gen.Trino_364____20220111_103516_2.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-952478870-192.168.212.2-1451608027649:blk_33418479072_36961430745 file=/user/hive/warehouse/ps.db/detail_uf/dt=1640628000/no=0/154d6b916cc5b128-8ea2b6547d0d1bbb_2066729209_data.0.parq
	at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:879)
	at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:862)
	at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:841)
	at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:567)
	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757)
	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829)
	at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
	at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
	at io.trino.plugin.hive.util.FSDataInputStreamTail.readTail(FSDataInputStreamTail.java:59)
	at io.trino.plugin.hive.parquet.HdfsParquetDataSource.readTail(HdfsParquetDataSource.java:109)
	... 33 more

Same query works fine on beeline.

It seems a decryption problem.
Which process did write the parquet file initially?

Can you provide some more info to help us reproduce the issue on a testing environment?

We have fixed this problem.
There was connectivity issue with one datanode.
After we fixed connectivity with datanode query is working fine.

2 Likes