Using Spark under EMR I'm running into a curious error when trying to get pyspark (or spark-shell) to boot up. I can load my jars onto HDFS and get the application master started on Spark, but it crashes soon after booting up trying to contact the YARN resource manager. The resulting error is strange:
WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (owner= �����*, renewer=, realUser=, issueDate=0, maxDate=0, sequenceNumber=0, masterKeyId=0) can't be found in cache ERROR yarn.ApplicationMaster: Uncaught exception: org.apache.hadoop.security.token.SecretManager$InvalidToken: token (owner= �����*, renewer=, realUser=, issueDate=0, maxDate=0, sequenceNumber=0, masterKeyId=0) can't be found in cache at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ...
This is despite trying to disable YARN authentication in every way I can think of (e.g.):
security.authorization=false security.authentication=simple
It appears that the Hadoop RPC proxy is pulling some garbage data from the environment, but I'm at a loss for where it's coming from. Any ideas for how to definitively disable authentication or to determine where the garbage data is coming from?
0 comments:
Post a Comment