Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-7248

FileBatchSource not working with azure blob storage

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.5.1
    • Fix Version/s: 4.0.0, 3.6.0, 3.5.1
    • Component/s: CDAP, Pipelines
    • Labels:
      None
    • Release Notes:
      Fixed a problem with the FileBatchSource not working with Azure Blob Storage.
    • Rank:
      1|hzzlpr:

      Description

      FileBatchSource, used in some MR jobs and the Hydrator HDFS file source, cannot connect to HDFS with this error:

      org.apache.hadoop.fs.azure.AzureException: org.apache.hadoop.fs.azure.AzureException: Container v2base10container in account v2base10storage.blob.core.windows.net not found, and we can't create it using anoynomous credentials, and no credentials found for them in the configuration.
      	at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:942) ~[hadoop-azure-2.7.1.2.4.2.4-5.jar:na]
      	at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:439) ~[hadoop-azure-2.7.1.2.4.2.4-5.jar:na]
      	at org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1160) ~[hadoop-azure-2.7.1.2.4.2.4-5.jar:na]
      	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653) ~[spark-assembly.jar:1.6.1.2.4.2.4-5]
      	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) ~[spark-assembly.jar:1.6.1.2.4.2.4-5]
      	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) ~[spark-assembly.jar:1.6.1.2.4.2.4-5]
      	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) ~[spark-assembly.jar:1.6.1.2.4.2.4-5]
      	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) ~[spark-assembly.jar:1.6.1.2.4.2.4-5]
      	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) ~[spark-assembly.jar:1.6.1.2.4.2.4-5]
      	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:500) ~[spark-assembly.jar:1.6.1.2.4.2.4-5]
      	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:469) ~[spark-assembly.jar:1.6.1.2.4.2.4-5]
      	at co.cask.hydrator.plugin.batch.CopybookSource.prepareRun(CopybookSource.java:109) ~[1473468153391-0/:na]
      	at co.cask.hydrator.plugin.batch.CopybookSource.prepareRun(CopybookSource.java:64) ~[1473468153391-0/:na]
      	at co.cask.cdap.etl.batch.LoggedBatchConfigurable$1.call(LoggedBatchConfigurable.java:44) ~[cdap-etl-core-3.5.0.jar:na]
      	at co.cask.cdap.etl.batch.LoggedBatchConfigurable$1.call(LoggedBatchConfigurable.java:41) ~[cdap-etl-core-3.5.0.jar:na]
      	at co.cask.cdap.etl.log.LogContext.run(LogContext.java:59) ~[cdap-etl-core-3.5.0.jar:na]
      	at co.cask.cdap.etl.batch.LoggedBatchConfigurable.prepareRun(LoggedBatchConfigurable.java:41) ~[cdap-etl-core-3.5.0.jar:na]
      	at co.cask.cdap.etl.batch.mapreduce.ETLMapReduce.initialize(ETLMapReduce.java:183) ~[cdap-etl-batch-3.5.0.jar:na]
      	at co.cask.cdap.api.mapreduce.AbstractMapReduce.initialize(AbstractMapReduce.java:171) ~[co.cask.cdap.cdap-api-3.5.0.jar:na]
      	at co.cask.cdap.api.mapreduce.AbstractMapReduce.initialize(AbstractMapReduce.java:33) ~[co.cask.cdap.cdap-api-3.5.0.jar:na]
      	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService$2.call(MapReduceRuntimeService.java:511) ~[co.cask.cdap.cdap-app-fabric-3.5.0.jar:na]
      	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService$2.call(MapReduceRuntimeService.java:504) ~[co.cask.cdap.cdap-app-fabric-3.5.0.jar:na]
      	at co.cask.cdap.data2.transaction.Transactions.execute(Transactions.java:174) ~[co.cask.cdap.cdap-data-fabric-3.5.0.jar:na]
      	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService.beforeSubmit(MapReduceRuntimeService.java:504) ~[co.cask.cdap.cdap-app-fabric-3.5.0.jar:na]
      	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService.startUp(MapReduceRuntimeService.java:215) ~[co.cask.cdap.cdap-app-fabric-3.5.0.jar:na]
      	at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) ~[com.google.guava.guava-13.0.1.jar:na]
      	at co.cask.cdap.internal.app.runtime.batch.MapReduceRuntimeService$1$1.run(MapReduceRuntimeService.java:422) [co.cask.cdap.cdap-app-fabric-3.5.0.jar:na]
      	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
      Caused by: org.apache.hadoop.fs.azure.AzureException: Container v2base10container in account v2base10storage.blob.core.windows.net not found, and we can't create it using anoynomous credentials, and no credentials found for them in the configuration.
      	at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.connectUsingAnonymousCredentials(AzureNativeFileSystemStore.java:733) ~[hadoop-azure-2.7.1.2.4.2.4-5.jar:na]
      	at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:937) ~[hadoop-azure-2.7.1.2.4.2.4-5.jar:na]
      	... 27 common frames omitted
      

      Initially, it appears that CDAP filters out entries from core-site.xml when running containers, and that it is filtering out the needed Azure Blob Storage credentials.

        Attachments

          Activity

            People

            • Assignee:
              sagar Sagar Kapare
              Reporter:
              derek Derek Wood
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: