Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-5676

SDK parquet queries fail if there are a lot of null values

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.5.0
    • Component/s: Explore
    • Labels:
    • Release Notes:
      Upgraded the Hive version used by the SDK to Hive-1.2.1 in order to pick up a fix for parquet tables.
    • Rank:
      1|hzzb9b:

      Description

      To reproduce:

      1. Create a Hydrator pipeline that reads from a stream and writes to a TPFSParquet sink. Make sure the schema used has a lot of columns (the one I reproduced with had 280 columns), and that most of the column values are null.

      2. Run an explore query, select * from table should be good enough. You will see an exception in the cdap log like:

      2016-04-19 18:40:09,824 - ERROR [netty-executor-4:c.c.c.c.HttpExceptionHandler@49] - Unexpected error: request=POST /v3/data/explore/queries/aa5aa052-b251-4beb-b447-0fc409f46809/preview user=<null>:
      java.lang.RuntimeException: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.NullPointerException
      	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[com.google.guava.guava-13.0.1.jar:na]
      	at co.cask.cdap.explore.service.hive.BaseHiveExploreService.fetchNextResults(BaseHiveExploreService.java:841) ~[co.cask.cdap.cdap-explore-3.4.0-SNAPSHOT.jar:na]
      	at co.cask.cdap.explore.service.hive.BaseHiveExploreService.previewResults(BaseHiveExploreService.java:879) ~[co.cask.cdap.cdap-explore-3.4.0-SNAPSHOT.jar:na]
      	at co.cask.cdap.explore.executor.QueryExecutorHttpHandler.getQueryResultPreview(QueryExecutorHttpHandler.java:168) ~[co.cask.cdap.cdap-explore-3.4.0-SNAPSHOT.jar:na]
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_77]
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_77]
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_77]
      	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_77]
      	at co.cask.http.HttpMethodInfo.invoke(HttpMethodInfo.java:80) ~[co.cask.http.netty-http-0.14.0.jar:na]
      	at co.cask.http.HttpDispatcher.messageReceived(HttpDispatcher.java:38) [co.cask.http.netty-http-0.14.0.jar:na]
      	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [io.netty.netty-3.6.6.Final.jar:na]
      	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [io.netty.netty-3.6.6.Final.jar:na]
      	at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [io.netty.netty-3.6.6.Final.jar:na]
      	at org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:43) [io.netty.netty-3.6.6.Final.jar:na]
      	at org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:67) [io.netty.netty-3.6.6.Final.jar:na]
      	at org.jboss.netty.handler.execution.OrderedMemoryAwareThreadPoolExecutor$ChildExecutor.run(OrderedMemoryAwareThreadPoolExecutor.java:314) [io.netty.netty-3.6.6.Final.jar:na]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_77]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_77]
      	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
      Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.NullPointerException
      	at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:343) ~[org.apache.hive.hive-service-1.1.0.jar:1.1.0]
      	at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:250) ~[org.apache.hive.hive-service-1.1.0.jar:1.1.0]
      	at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:656) ~[org.apache.hive.hive-service-1.1.0.jar:1.1.0]
      	at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:451) ~[org.apache.hive.hive-service-1.1.0.jar:1.1.0]
      	at co.cask.cdap.explore.service.hive.Hive14ExploreService.doFetchNextResults(Hive14ExploreService.java:71) ~[co.cask.cdap.cdap-explore-3.4.0-SNAPSHOT.jar:na]
      	at co.cask.cdap.explore.service.hive.BaseHiveExploreService.fetchNextResults(BaseHiveExploreService.java:836) ~[co.cask.cdap.cdap-explore-3.4.0-SNAPSHOT.jar:na]
      	... 17 common frames omitted
      Caused by: java.io.IOException: java.lang.NullPointerException
      	at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507) ~[org.apache.hive.hive-exec-1.1.0.jar:1.1.0]
      	at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414) ~[org.apache.hive.hive-exec-1.1.0.jar:1.1.0]
      	at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138) ~[org.apache.hive.hive-exec-1.1.0.jar:1.1.0]
      	at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1655) ~[org.apache.hive.hive-exec-1.1.0.jar:1.1.0]
      	at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:338) ~[org.apache.hive.hive-service-1.1.0.jar:1.1.0]
      	... 22 common frames omitted
      Caused by: java.lang.NullPointerException: null
      	at parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:249) ~[com.twitter.parquet-hadoop-bundle-1.6.0rc3.jar:na]
      	at parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:543) ~[com.twitter.parquet-hadoop-bundle-1.6.0rc3.jar:na]
      	at parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:520) ~[com.twitter.parquet-hadoop-bundle-1.6.0rc3.jar:na]
      	at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:426) ~[com.twitter.parquet-hadoop-bundle-1.6.0rc3.jar:na]
      	at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:381) ~[com.twitter.parquet-hadoop-bundle-1.6.0rc3.jar:na]
      	at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:367) ~[com.twitter.parquet-hadoop-bundle-1.6.0rc3.jar:na]
      	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:228) ~[org.apache.hive.hive-exec-1.1.0.jar:1.1.0]
      	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:84) ~[org.apache.hive.hive-exec-1.1.0.jar:1.1.0]
      	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:71) ~[org.apache.hive.hive-exec-1.1.0.jar:1.1.0]
      	at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72) ~[org.apache.hive.hive-exec-1.1.0.jar:1.1.0]
      	at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:667) ~[org.apache.hive.hive-exec-1.1.0.jar:1.1.0]
      	at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:323) ~[org.apache.hive.hive-exec-1.1.0.jar:1.1.0]
      	at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445) ~[org.apache.hive.hive-exec-1.1.0.jar:1.1.0]
      	... 26 common frames omitted
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ashau Albert Shau
                Reporter:
                ashau Albert Shau
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: