I have a simple spark
project - in which in the pom.xml
the dependencies are only the basic scala
, scalatest
/junit
, and spark
:
<dependency> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>3.2.0</version> </dependency> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>${scala.version}</version> </dependency> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-compiler</artifactId> <version>${scala.version}</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.11</version> <scope>test</scope> </dependency> <dependency> <groupId>org.scalatest</groupId> <artifactId>scalatest_${scala.binary.version}</artifactId> <version>3.0.1</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.binary.version}</artifactId> <version>${spark.version}</version> <scope>compile</scope> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-mllib_${scala.binary.version}</artifactId> <version>${spark.version}</version> <scope>compile</scope> </dependency> </dependencies>
When attempting to run a basic spark
program the SparkSession
init fails on this line:
SparkSession.builder.master(master).appName("sparkApp").getOrCreate
Here is the output / error:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 18/04/07 18:06:15 INFO SparkContext: Running Spark version 2.2.1 Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder .refreshAfterWrite(JLjava/util/concurrent/TimeUnit;) Lcom/google/common/cache/CacheBuilder; at org.apache.hadoop.security.Groups.<init>(Groups.java:96) at org.apache.hadoop.security.Groups.<init>(Groups.java:73) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:293) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:283) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:789) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2424) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2424) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2424) at org.apache.spark.SparkContext.<init>(SparkContext.scala:295) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2516) at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:918) at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:910) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:910)
I have run spark
locally many dozens of times on other projects, what might be wrong with this simple one? Is there a dependency on $HADOOP_HOME
environment variable or similar?
Update By downgrading the spark
version to 2.0.1
I was able to compile. That does not fix the problem (we need newer) version. But it helps point out the source of the problem
Another update In a different project the hack to downgrade to 2.0.1
does help - i.e. execution proceeds further : but then when writing out to parquet
a similar exception does happen.
8/05/07 11:26:11 ERROR Executor: Exception in task 0.0 in stage 2741.0 (TID 2618) java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache; at org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62) at org.apache.hadoop.io.compress.CodecPool.<clinit>(CodecPool.java:74) at org.apache.parquet.hadoop.CodecFactory$BytesCompressor.<init>(CodecFactory.java:92) at org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:169) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:303) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262) at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetFileFormat.scala:562) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139) at org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
1 Answers
Answers 1
This error occurs due to version mismatch between Google's guava library and Spark. Spark shades guava but many libraries use guava. You can try Shading the Guava dependencies as per this post. Apache-Spark-User-List
0 comments:
Post a Comment