Monday, March 14, 2016

Spark saveAsTextFile() results in Mkdirs failed to create for half of the directory

Leave a Comment

I am currently running a Java Spark Application in tomcat and receiving the following exception:

Caused by: java.io.IOException: Mkdirs failed to create file:/opt/folder/tmp/file.json/_temporary/0/_temporary/attempt_201603031703_0001_m_000000_5 

on the line

text.saveAsTextFile("/opt/folder/tmp/file.json") //where text is a JavaRDD<String>

The issue is that /opt/folder/tmp/ already exists and successfully creates up to /opt/folder/tmp/file.json/_temporary/0/ and then it runs into what looks like a permission issue with the remaining part of the path _temporary/attempt_201603031703_0001_m_000000_5 itself, but I gave the tomcat user permissions (chown -R tomcat:tomcat tmp/ and chmod -R 755 tmp/) to the tmp/ directory. Does anyone know what could be happening?

Thanks

Edit for @javadba:

[root@ip tmp]# ls -lrta  total 12 drwxr-xr-x 4 tomcat tomcat 4096 Mar  3 16:44 .. drwxr-xr-x 3 tomcat tomcat 4096 Mar  7 20:01 file.json drwxrwxrwx 3 tomcat tomcat 4096 Mar  7 20:01 .  [root@ip tmp]# cd file.json/ [root@ip file.json]# ls -lrta  total 12 drwxr-xr-x 3 tomcat tomcat 4096 Mar  7 20:01 _temporary drwxrwxrwx 3 tomcat tomcat 4096 Mar  7 20:01 .. drwxr-xr-x 3 tomcat tomcat 4096 Mar  7 20:01 .  [root@ip file.json]# cd _temporary/ [root@ip _temporary]# ls -lrta  total 12 drwxr-xr-x 2 tomcat tomcat 4096 Mar  7 20:01 0 drwxr-xr-x 3 tomcat tomcat 4096 Mar  7 20:01 .. drwxr-xr-x 3 tomcat tomcat 4096 Mar  7 20:01 .  [root@ip _temporary]# cd 0/ [root@ip 0]# ls -lrta  total 8 drwxr-xr-x 3 tomcat tomcat 4096 Mar  7 20:01 .. drwxr-xr-x 2 tomcat tomcat 4096 Mar  7 20:01 . 

The exception in catalina.out

Caused by: java.io.IOException: Mkdirs failed to create file:/opt/folder/tmp/file.json/_temporary/0/_temporary/attempt_201603072001_0001_m_000000_5     at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438)     at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799)     at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)     at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)     at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1193)     at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)     at org.apache.spark.scheduler.Task.run(Task.scala:89)     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)     ... 1 more 

3 Answers

Answers 1

I suggest to try changing to 777 temporarily . See if it works at that point. There have been bugs/issues wrt permissions on local file system. If that still does not work let us know if anything changed or precisely same result.

Answers 2

Could it be selinux/apparmor that plays you a trick? Check with ls -Z and system logs.

Answers 3

saveAsTextFile is really processed by Spark executors. Depending on your Spark setup, Spark executors may run as a different user than your Spark application driver. I guess the spark application driver prepares the directory for the job fine, but then the executors running as a different user have no rights to write in that directory.

Changing to 777 won't help, because permissions are not inherited by child dirs, so you'd get 755 anyways.

Try running your Spark application as the same user that runs your Spark.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment