Friday, September 16, 2016

Custom source/sink configurations not getting recognized

Leave a Comment

I've written my custom metrics source/sink for my Spark streaming app and I am trying to initialize it from metrics.properties - but that doesn't work from executors. I don't have control on the machines in Spark cluster, so I can't copy properties file in $SPARK_HOME/conf/ in the cluster. I have it in the fat jar where my app lives, but by the time my fat jar is downloaded on worker nodes in cluster, executors are already started and their Metrics system is already initialized - thus not picking my file with custom source configuration in it.

Following this post, I've specified 'spark.files = metrics.properties' and 'spark.metrics.conf=metrics.properties' but by the time 'metrics.properties' is shipped to executors, their metric system is already initialized.

If I initialize my own metrics system, it's picking up my file but then I'm missing master/executor level metrics/properties (eg. executor.sink.mySink.propName=myProp - can't read 'propName' from 'mySink') since they are initialized by Spark's metric system.

Is there a (programmatic) way to have 'metrics.properties' shipped before executors initialize their metrics system?

Update1: I am trying this on stand-alone Spark 2.0.0 cluster

2 Answers

Answers 1

see Spark metrics on wordcount example Basically I believe you need to add --files to send the metrics.properties to all workers

Answers 2

SparkConf only load local system properties if they start with the prefix spark., do you have tray to load your properties adding spark?

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment