Tuesday, April 11, 2017

Why does my pyspark just hang as ACCEPTED in yarn when I launch it?

Leave a Comment

I just spun up a new AWS instance in Linux. And, I installed pyspark on it. It has spark 1.6.

I'm running pyspark with yarn. When I do the command pyspark in the terminal, it launches initially, but then I get the message:

dd/mm/YY HH:MM:SS INFO yarn.Client: Application report for application_XXXXXXXXXXX_XXXX (state: ACCEPTED) 

.....and then this just continues for forever.

So, I checked yarnto see if anything else was running:

yarn application -list 

And ONLY shows my application running. How do I open up the pyspark shell and get my application to start rather than just being ACCEPTED?

3 Answers

Answers 1

Can you try to run spark-shell and see if that goes into running state or not?

This happens when yarn doesn't have requested resources from it.

Example: Lets say yarn has 5gb of free memory available and you are requesting 10gb. Your job would be stuck in Accepted phase till it gets the requested memory.

Answers 2

Adding to Grover answer, you can set spark.dynamicAllocation.enabled and yarn.scheduler.fair.preemption to True to get your job started asap.

Answers 3

This problem if with resources or problems with queue.

Please set all this options in yarn-site.xml to have enough resources on your cluster: yarn.scheduler.maximum-allocation-mb, yarn.scheduler.maximum-allocation-vcores, yarn.nodemanager.resource.memory-mb, yarn.nodemanager.resource.cpu-vcores

Also you might hit a bug/problem with queues, which can be resolved by setting queueMaxAMShareDefault to -1.0 in fair-scheduler.xml(on the node with Resource Manager) if you use fair scheduler, and restart Resource Manager.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment