I have a scenario where I want to implement a variant of a Cake Pattern, but adding implicit functionality to a class (a Spark DataFrame).
So, basically, I want to be able to run a code like the following:
trait Transformer { this: ColumnAdder => def transform(input: DataFrame): DataFrame = { input.addColumn("newCol") } } val input = sqlContext.range(0, 5) val transformer = new Transformer with StringColumnAdder val output = transformer.transform(input) output.show
And find a result like the following:
+---+------+ | id|newCol| +---+------+ | 0|newCol| | 1|newCol| | 2|newCol| | 3|newCol| | 4|newCol| +---+------+
My first idea was to define the implicit classes only in the base traits:
trait ColumnAdder { protected def _addColumn(df: DataFrame, colName: String): DataFrame implicit class ColumnAdderRichDataFrame(df: DataFrame) { def addColumn(colName: String): DataFrame = _addColumn(df, colName) } } trait StringColumnAdder extends ColumnAdder { protected def _addColumn(df: DataFrame, colName: String): DataFrame = { df.withColumn(colName, lit(colName)) } }
And it works, but I was not entirely happy with this approach, because of the function signatures duplication. So I thought of another approach, using the (deprecated?) implicit def
strategy:
trait ColumnAdder { protected implicit def columnAdderImplicits(df: DataFrame): ColumnAdderDataFrame abstract class ColumnAdderDataFrame(df: DataFrame) { def addColumn(colName: String): DataFrame } } trait StringColumnAdder extends ColumnAdder { protected implicit def columnAdderImplicits(df: DataFrame): ColumnAdderDataFrame = new StringColumnAdderDataFrame(df) class StringColumnAdderDataFrame(df: DataFrame) extends ColumnAdderDataFrame(df) { def addColumn(colName: String): DataFrame = { df.withColumn(colName, lit(colName)) } } }
(The full reproducible code, including an extra trait-module can be found here)
So, I wanted to ask which approach is the best and if there may be another better way to achieve what I want.
0 comments:
Post a Comment