Tuesday, May 15, 2018

Spark error with google/guava library: java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.refreshAfterWrite

Leave a Comment

I have a simple spark project - in which in the pom.xml the dependencies are only the basic scala, scalatest/junit, and spark:

    <dependency>         <groupId>net.alchim31.maven</groupId>         <artifactId>scala-maven-plugin</artifactId>         <version>3.2.0</version>     </dependency>     <dependency>         <groupId>org.scala-lang</groupId>         <artifactId>scala-library</artifactId>         <version>${scala.version}</version>     </dependency>     <dependency>         <groupId>org.scala-lang</groupId>         <artifactId>scala-compiler</artifactId>         <version>${scala.version}</version>     </dependency>     <dependency>         <groupId>junit</groupId>         <artifactId>junit</artifactId>         <version>4.11</version>         <scope>test</scope>     </dependency>     <dependency>         <groupId>org.scalatest</groupId>         <artifactId>scalatest_${scala.binary.version}</artifactId>         <version>3.0.1</version>         <scope>test</scope>     </dependency>     <dependency>         <groupId>org.apache.spark</groupId>         <artifactId>spark-core_${scala.binary.version}</artifactId>         <version>${spark.version}</version>         <scope>compile</scope>     </dependency>     <dependency>         <groupId>org.apache.spark</groupId>         <artifactId>spark-mllib_${scala.binary.version}</artifactId>         <version>${spark.version}</version>         <scope>compile</scope>     </dependency> </dependencies> 

When attempting to run a basic spark program the SparkSession init fails on this line:

 SparkSession.builder.master(master).appName("sparkApp").getOrCreate 

Here is the output / error:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 18/04/07 18:06:15 INFO SparkContext: Running Spark version 2.2.1 Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder .refreshAfterWrite(JLjava/util/concurrent/TimeUnit;) Lcom/google/common/cache/CacheBuilder;     at org.apache.hadoop.security.Groups.<init>(Groups.java:96)     at org.apache.hadoop.security.Groups.<init>(Groups.java:73)  at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:293) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:283) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:789) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2424) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2424) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2424) at org.apache.spark.SparkContext.<init>(SparkContext.scala:295) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2516) at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:918) at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:910) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:910) 

I have run spark locally many dozens of times on other projects, what might be wrong with this simple one? Is there a dependency on $HADOOP_HOME environment variable or similar?

Update By downgrading the spark version to 2.0.1 I was able to compile. That does not fix the problem (we need newer) version. But it helps point out the source of the problem

Another update In a different project the hack to downgrade to 2.0.1 does help - i.e. execution proceeds further : but then when writing out to parquet a similar exception does happen.

8/05/07 11:26:11 ERROR Executor: Exception in task 0.0 in stage 2741.0 (TID 2618) java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;     at org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)     at org.apache.hadoop.io.compress.CodecPool.<clinit>(CodecPool.java:74)     at org.apache.parquet.hadoop.CodecFactory$BytesCompressor.<init>(CodecFactory.java:92)     at org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:169)     at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:303)     at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262)     at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetFileFormat.scala:562)     at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139)     at org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131)     at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247)     at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)     at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)     at org.apache.spark.scheduler.Task.run(Task.scala:86)     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 

1 Answers

Answers 1

This error occurs due to version mismatch between Google's guava library and Spark. Spark shades guava but many libraries use guava. You can try Shading the Guava dependencies as per this post. Apache-Spark-User-List

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment