Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-3888

ETL Batch fails if a only a single record is malformed

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 3.2.0
    • Fix Version/s: None
    • Component/s: ETL
    • Labels:
      None
    • Rank:
      1|hzz147:

      Description

      ETL batch from a database to TPFSAvro fails because one record is wrong:

      2015-09-29 17:54:33,411 - ERROR [LocalJobRunner Map Task Executor #0:c.c.c.e.b.ETLMapReduce$ETLMapper@380] - Exception thrown in BatchDriver Mapper : {}
      org.apache.avro.AvroRuntimeException: Field zip type:INT pos:6 does not accept null values
      	at org.apache.avro.data.RecordBuilderBase.validate(RecordBuilderBase.java:89) ~[org.apache.avro.avro-1.6.2.jar:1.6.2]
      	at org.apache.avro.generic.GenericRecordBuilder.set(GenericRecordBuilder.java:135) ~[org.apache.avro.avro-1.6.2.jar:1.6.2]
      	at org.apache.avro.generic.GenericRecordBuilder.set(GenericRecordBuilder.java:114) ~[org.apache.avro.avro-1.6.2.jar:1.6.2]
      	at org.apache.avro.generic.GenericRecordBuilder.set(GenericRecordBuilder.java:104) ~[org.apache.avro.avro-1.6.2.jar:1.6.2]
      	at co.cask.cdap.etl.common.StructuredToAvroTransformer.transform(StructuredToAvroTransformer.java:60) ~[na:na]
      	at co.cask.cdap.etl.batch.sink.TimePartitionedFileSetDatasetAvroSink.transform(TimePartitionedFileSetDatasetAvroSink.java:96) ~[na:na]
      	at co.cask.cdap.etl.batch.sink.TimePartitionedFileSetDatasetAvroSink.transform(TimePartitionedFileSetDatasetAvroSink.java:46) ~[na:na]
      	at co.cask.cdap.etl.batch.ETLMapReduce$MultiOutputSink.write(ETLMapReduce.java:468) ~[ETLWorkflow.d6cf374e-45e3-41a1-8083-d6c41aedd90d/:na]
      	at co.cask.cdap.etl.batch.ETLMapReduce$ETLMapper.map(ETLMapReduce.java:361) ~[ETLWorkflow.d6cf374e-45e3-41a1-8083-d6c41aedd90d/:na]
      	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) [org.apache.hadoop.hadoop-mapreduce-client-core-2.3.0.jar:na]
      	at co.cask.cdap.internal.app.runtime.batch.MapperWrapper.run(MapperWrapper.java:102) [co.cask.cdap.cdap-app-fabric-3.2.0.jar:na]
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) [org.apache.hadoop.hadoop-mapreduce-client-core-2.3.0.jar:na]
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) [org.apache.hadoop.hadoop-mapreduce-client-core-2.3.0.jar:na]
      	at org.apache.hadoop.mapred.LocalJobRunnerWithFix$Job$MapTaskRunnable.run(LocalJobRunnerWithFix.java:243) [co.cask.cdap.cdap-app-fabric-3.2.0.jar:na]
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_60]
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_60]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_60]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_60]
      	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
      

      The job should not fail because of some bad data.

        Attachments

          Activity

            People

            • Assignee:
              ashau Albert Shau
              Reporter:
              andreas Andreas Neumann
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: