I just spun up a new AWS
instance in Linux
. And, I installed pyspark
on it. It has spark 1.6
.
I'm running pyspark
with yarn
. When I do the command pyspark
in the terminal, it launches initially, but then I get the message:
dd/mm/YY HH:MM:SS INFO yarn.Client: Application report for application_XXXXXXXXXXX_XXXX (state: ACCEPTED)
.....and then this just continues for forever.
So, I checked yarn
to see if anything else was running:
yarn application -list
And ONLY shows my application running. How do I open up the pyspark
shell and get my application to start rather than just being ACCEPTED
?
3 Answers
Answers 1
Can you try to run spark-shell and see if that goes into running state or not?
This happens when yarn doesn't have requested resources from it.
Example: Lets say yarn has 5gb of free memory available and you are requesting 10gb. Your job would be stuck in Accepted phase till it gets the requested memory.
Answers 2
Adding to Grover answer, you can set spark.dynamicAllocation.enabled and yarn.scheduler.fair.preemption to True to get your job started asap.
Answers 3
This problem if with resources or problems with queue.
Please set all this options in yarn-site.xml
to have enough resources on your cluster: yarn.scheduler.maximum-allocation-mb
, yarn.scheduler.maximum-allocation-vcores
, yarn.nodemanager.resource.memory-mb
, yarn.nodemanager.resource.cpu-vcores
Also you might hit a bug/problem with queues, which can be resolved by setting queueMaxAMShareDefault
to -1.0
in fair-scheduler.xml
(on the node with Resource Manager) if you use fair scheduler, and restart Resource Manager.
0 comments:
Post a Comment