Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-6505

when running a M/R job reading from stream which reads data in the last 2 minutes, there are failures (running on SDK standalone).

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: 3.5.0
    • Component/s: CDAP Examples
    • Labels:
      None
    • Rank:
      1|hzzg7j:

      Description

      When running StreamConversion example (on sdk-standalone) with added reducer and the following configurations:

      • scheduled every 2 minutes;
      • data is being ingested continuously to the stream (2 events every second, script is in/send-events.sh);
      • frequency with which the mapreduce is being run is every 2 min.

      the following reducer class was used:
      public static class StreamConfReducer extends
      Reducer<Text, Text, NullWritable, NullWritable> {
      @Override
      public void reduce(Text timestamp, Iterable<Text> streamEvents, Context context)
      throws IOException, InterruptedException {
      for(Text streamEvent : streamEvents ) {
      LOG.info("Logger reducer {}", streamEvent.toString());
      }
      }
      }

      the failures occurs wth the following error:

      2016-07-15 17:08:02,095 - WARN  [Thread-463:o.a.h.m.LocalJobRunnerWithFix$Job@562] - Error cleaning up job: job_local1336610459_0012
      java.lang.Exception: java.io.IOException: Cannot seek after EOF
              at org.apache.hadoop.mapred.LocalJobRunnerWithFix$Job.runTasks(LocalJobRunnerWithFix.java:465) ~[co.cask.cdap.cdap-app-fabric-3.4.3.jar:na]
              at org.apache.hadoop.mapred.LocalJobRunnerWithFix$Job.run(LocalJobRunnerWithFix.java:524) ~[co.cask.cdap.cdap-app-fabric-3.4.3.jar:na]
      java.io.IOException: Cannot seek after EOF
              at org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream.seek(ChecksumFileSystem.java:321) ~[org.apache.hadoop.hadoop-common-2.3.0.jar:na]
              at co.cask.cdap.common.io.DFSSeekableInputStream.seek(DFSSeekableInputStream.java:56) ~[co.cask.cdap.cdap-common-3.4.3.jar:na]
              at co.cask.cdap.data.stream.StreamDataFileReader.skipUntil(StreamDataFileReader.java:417) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
              at co.cask.cdap.data.stream.StreamDataFileReader.initByTime(StreamDataFileReader.java:388) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
              at co.cask.cdap.data.stream.StreamDataFileReader.readDataBlock(StreamDataFileReader.java:502) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
              at co.cask.cdap.data.stream.StreamDataFileReader.nextStreamEvent(StreamDataFileReader.java:523) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
              at co.cask.cdap.data.stream.StreamDataFileReader.read(StreamDataFileReader.java:189) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
              at co.cask.cdap.data.stream.StreamRecordReader.nextKeyValue(StreamRecordReader.java:69) ~[co.cask.cdap.cdap-data-fabric-3.4.3.jar:na]
              at co.cask.cdap.internal.app.runtime.batch.dataset.input.DelegatingRecordReader.nextKeyValue(DelegatingRecordReader.java:84) ~[co.cask.cdap.cdap-app-fabric-3.4.3.jar:na]
              at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.3.0.jar:na]
              at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.3.0.jar:na]
              at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.3.0.jar:na]
              at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.3.0.jar:na]
              at co.cask.cdap.internal.app.runtime.batch.MapperWrapper$1.nextKeyValue(MapperWrapper.java:159) ~[co.cask.cdap.cdap-app-fabric-3.4.3.jar:na]
              at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.3.0.jar:na]
              at co.cask.cdap.internal.app.runtime.batch.MapperWrapper.run(MapperWrapper.java:117) ~[co.cask.cdap.cdap-app-fabric-3.4.3.jar:na]
              at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.3.0.jar:na]
              at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.3.0.jar:na]
              at org.apache.hadoop.mapred.LocalJobRunnerWithFix$Job$MapTaskRunnable.run(LocalJobRunnerWithFix.java:243) ~[co.cask.cdap.cdap-app-fabric-3.4.3.jar:na]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_79]
              at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_79]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_79]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_79]
              at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_79]
      2016-07-15 17:08:02,832 - INFO  [MapReduc
      
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                terence Terence Yim
                Reporter:
                iraida Iraida
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: