I just spun up a new AWS instance in Linux. And, I installed pyspark on it. It has spark 1.6.
I'm running pyspark with yarn. When I do the command pyspark in the terminal, it launches initially, but then I get the message:
dd/mm/YY HH:MM:SS INFO yarn.Client: Application report for application_XXXXXXXXXXX_XXXX (state: ACCEPTED) .....and then this just continues for forever.
So, I checked yarnto see if anything else was running:
yarn application -list And ONLY shows my application running. How do I open up the pyspark shell and get my application to start rather than just being ACCEPTED?
3 Answers
Answers 1
Can you try to run spark-shell and see if that goes into running state or not?
This happens when yarn doesn't have requested resources from it.
Example: Lets say yarn has 5gb of free memory available and you are requesting 10gb. Your job would be stuck in Accepted phase till it gets the requested memory.
Answers 2
Adding to Grover answer, you can set spark.dynamicAllocation.enabled and yarn.scheduler.fair.preemption to True to get your job started asap.
Answers 3
This problem if with resources or problems with queue.
Please set all this options in yarn-site.xml to have enough resources on your cluster: yarn.scheduler.maximum-allocation-mb, yarn.scheduler.maximum-allocation-vcores, yarn.nodemanager.resource.memory-mb, yarn.nodemanager.resource.cpu-vcores
Also you might hit a bug/problem with queues, which can be resolved by setting queueMaxAMShareDefault to -1.0 in fair-scheduler.xml(on the node with Resource Manager) if you use fair scheduler, and restart Resource Manager.
0 comments:
Post a Comment